Christopher King

Karma: 826

@theking@mathstodon.xyz

Christopher King May 2, 2025, 7:10 PM
1 point
0
in reply to: Tapatakt’s comment on: LDT (and everything else) can be irrational
In the first case, the problem is “symmetrical ultimatum game against X”, in which $9 rock does get $9.

In the second case you are correct, in the problem “symmetrical ultimatum game against $9 rock” $9 rock gets $0.

Christopher King Apr 29, 2025, 1:16 PM
1 point
0
on: God Help Us, Let’s Try To Understand Friston On Free Energy
Okay, I think I figured out the dark room thing: The Way You Go Depends A Good Deal On Where You Want To Get: FEP minimizes surprise about actions using preferences about the future as evidence

Christopher King Apr 28, 2025, 3:15 PM
1 point
0
in reply to: J Bostock’s comment on: The Way You Go Depends A Good Deal On Where You Want To Get: FEP minimizes surprise about actions using preferences about the future as *evidence*
Yeah my understanding is that FEP is meant to be quite general, the P and Q are doing a lot of the theory’s work for it.

Chapter 5 describes how you might apply it to the human brain in particular.

The Way You Go Depends A Good Deal On Where You Want To Get: FEP minimizes surprise about actions using preferences about the future as evidence

Christopher KingApr 27, 2025, 9:55 PM

9 points

5 comments5 min readLW link

METR’s preliminary evaluation of o3 and o4-mini

Christopher KingApr 16, 2025, 8:23 PM

14 points

7 comments1 min readLW link

(metr.github.io)

Christopher King Apr 11, 2025, 2:26 PM
1 point
0
in reply to: Grace Kind’s comment on: Alignment Faking Revisited: Improved Classifiers and Open Source Extensions
“Solid values” would mean no compliance and it not caring whether it is training.

Alignment faking means it complies with harmful instructions more often in training. It is technically a jail-break. We would prefer it is true to its values both in training and out.

Christopher King Mar 28, 2025, 3:40 PM
12 points
7
on: Mistral Large 2 (123B) exhibits alignment faking

We hope sharing this will help other researchers perform more cost-effective experiments on alignment faking.

And also it is a cheap example of a model organism of misalignment.

Christopher King Mar 24, 2025, 9:39 PM
2 points
−2
on: Will Jesus Christ return in an election year?
I think this would be fixed if they didn’t force yes and no to add to 100%. If they have the same interest rate, the price ratio would reveal the true odds.

The problem is you’re forcing a one year loan for $1 to add up to $1 in the present. It should add up to less than $1.

Christopher King Mar 24, 2025, 9:15 PM
1 point
0
in reply to: Davidmanheim’s comment on: LDT (and everything else) can be irrational
I’m assuming the LDT agent knows what the game is and who their opponent is.

Christopher King Mar 24, 2025, 3:47 PM
1 point
0
in reply to: Davidmanheim’s comment on: LDT (and everything else) can be irrational
Towards the end of the post in the No agent is rational in every problem section, I provided a more general argument. I was assuming LDT would fall under case 1, but if not then case 2 demonstrates it is irrational.

Christopher King Mar 24, 2025, 3:46 PM
1 point
0
in reply to: Davidmanheim’s comment on: How to Give in to Threats (without incentivizing them)
Towards the end of the post in the No agent is rational in every problem section, I provided a more general argument. I was assuming LDT would fall under case 1, but if not case 2 will demonstrate it is irrational.

Christopher King Mar 21, 2025, 2:59 PM
1 point
0
on: A computational no-coincidence principle
Ultimately, though, we are not wedded to our particular formulation. Perhaps there is some clever sampling-based verifier that “trivializes” our conjecture as well, in which case we would want to revise it.
I think your goal should be to show that your abstract conjecture implies the concrete result you’re after, or is even equivalent to it.
At ARC, we are interested in finding explanations of neural network behavior. Concretely, a trained neural net (such as GPT-4) exhibits a really surprising property: it gets low loss on the training set (far lower than a random neural net).
We can formalize this in a similar way as the reversible circuit conjecture. Here’s a rough sketch:
Transformer performance no-coincidence conjecture: Consider a computable process that randomly generates text. The distribution has significantly lower entropy than the uniform distribution. Consider the property P(T) that says “the transformer T gets low average loss when predicting this process”. There is a deterministic polynomial time verifier V(T, π) such that:
1. P(T) implies that there exists a polynomial length π with V(T,π) = 1.
2. For 99% of transformers T, there is no π with V(T,π) = 1.
Note that “ignore π, and then test T on a small number of inputs” doesn’t work. P is only asking if T has low average loss, so you can’t falsify P with a small number of inputs.

Christopher King Mar 21, 2025, 12:00 AM
1 point
0
in reply to: Gunnar_Zarncke’s comment on: How far along Metr’s law can AI start automating or helping with alignment research?
I mean, beating a chess engine in 2005 might be a “years-long task” for a human? The time METR is measuring is how long it would hypothetically take a human to do the task, not how long it takes the AI.

Christopher King Mar 20, 2025, 11:54 PM
1 point
0
in reply to: Gunnar_Zarncke’s comment on: How far along Metr’s law can AI start automating or helping with alignment research?
What is the absurd conclusion?

Christopher King Mar 20, 2025, 5:42 PM
1 point
0
in reply to: tailcalled’s comment on: How far along Metr’s law can AI start automating or helping with alignment research?
You’re saying that if you assigned 1 human contractor the task of solving superalignment, they would succeed after ~3.5 billion years of work? 🤔 I think you misunderstood what the y-axis on the graph is measuring.

[Question] How far along Metr’s law can AI start automating or helping with alignment research?

Christopher KingMar 20, 2025, 3:58 PM

20 points

21 comments1 min readLW link

Christopher King Mar 20, 2025, 1:44 PM
17 points
5
on: METR: Measuring AI Ability to Complete Long Tasks
I think the most mysterious part of this trend is that the x-axis is release date. Very useful but mysterious.

No, the Polymarket price does not mean we can immediately conclude what the probability of a bird flu pandemic is. We also need to know the interest rate!

Christopher King28 Dec 2024 16:05 UTC

7 points

11 comments1 min readLW link

Christopher King 27 Dec 2024 17:38 UTC
5 points
0
on: The Field of AI Alignment: A Postmortem, and What To Do About It
I think there is an obvious signal that could be used: a forecast of how much MIRI will like the research when asked in 5 years. (Note that I don’t mean just asking MIRI now, but rather something like prediction markets or super-forecasters to predict what MIRI will say 5 years from now.)

Basically, if the forecast is above average, anyone who trusts MIRI should fund them.

How I saved 1 human life (in expectation) without overthinking it

Christopher King22 Dec 2024 20:53 UTC

19 points

0 comments4 min readLW link

Christopher King

The Way You Go Depends A Good Deal On Where You Want To Get: FEP min­i­mizes sur­prise about ac­tions us­ing prefer­ences about the fu­ture as *ev­i­dence*

METR’s pre­limi­nary eval­u­a­tion of o3 and o4-mini

[Question] How far along Metr’s law can AI start au­tomat­ing or helping with al­ign­ment re­search?

No, the Poly­mar­ket price does not mean we can im­me­di­ately con­clude what the prob­a­bil­ity of a bird flu pan­demic is. We also need to know the in­ter­est rate!

How I saved 1 hu­man life (in ex­pec­ta­tion) with­out over­think­ing it

The Way You Go Depends A Good Deal On Where You Want To Get: FEP minimizes surprise about actions using preferences about the future as evidence

METR’s preliminary evaluation of o3 and o4-mini

[Question] How far along Metr’s law can AI start automating or helping with alignment research?

No, the Polymarket price does not mean we can immediately conclude what the probability of a bird flu pandemic is. We also need to know the interest rate!

How I saved 1 human life (in expectation) without overthinking it