TheoR

Karma: 7

AI Safety Engineer at Palisade Research.

TheoR 15 Mar 2026 14:20 UTC
1 point
0
on: On The Independence Axiom
If you accept both properties and you violate independence, you can be money-pumped. Here is how it works, concretely. Suppose your preference between gambles A and B depends on what the common component C is (as the independence axiom says it shouldn’t). Before the uncertainty resolves, you evaluate the compound lottery holistically and prefer the plan involving B (because, in combination with the C branch, B produces a better overall distribution). But then the coin comes up heads, the C branch is now off the table, and you find yourself choosing between A and B in isolation. Consequentialism says you should evaluate based on what’s still possible. And in isolation, you prefer A. So you switch from your plan (B) to your current preference (A). You are dynamically inconsistent.
Can someone clarify this passage to me? I find myself increasingly confused. Earlier, we assume agent can form a plan: “if the coin comes up heads (no C), I will choose A, if coin comes up tails, I will choose B (with C)”. How can I be money pumped? I don’t violate dynamic consistency nor do I violate consequentialism. Yet I violate independence, and can’t be money pumped. I can’t be convinced to pre-commit to either B or A, since there are no predictors involved, and I can just postpone my actual choice.

Edit: Actually, I don’t violate independence either, these are simply different outcomes. So I don’t understand this argument at all.

TheoR 15 Mar 2026 13:34 UTC
1 point
0
on: On The Independence Axiom
Here is the specific confusion that matters for our purposes. When someone says “a rational agent maximizes expected utility,” this sounds, to a casual listener, like it means “a rational agent computes the probability-weighted average of their subjective values across all possible outcomes.” In other words, it sounds like the agent takes f1, the function representing how good each outcome feels or how much they value it, and averages it across possible worlds, weighted by probability. This would mean that the agent literally values a gamble at the weighted sum of how much they value each possible result.
This seems untrue. “a rational agent computes the probability-weighted average of their subjective values across all possible outcomes.” isn’t the same as agent taking expected value of f1. Expected value of f1 doesn’t carry any meaning at all — it is ordinal, not cardinal. I could prefer two apples to one apple only slightly, but f1(two apples) would be extremely larger than f1(one apple), without violating any Debreu’s theorems.

What this actually says is that agents takes f2, and takes expected value of it across all possible outcomes. This is exactly what VNM agent does per the original theorem, and it is true, per my understanding, that agents “value gambles at the weighted sum of how much they value each possible result”.

TheoR 15 Mar 2026 12:27 UTC
1 point
0
on: VNM expected utility theory: uses, abuses, and interpretation
I think the most natural fix within the VNM theory is to just say S’ and D’ are the events “car is awarded so son/daughter based on a coin toss”, which are slightly better than S and D themselves, and that F is really 0.5S’ + 0.5D’. Unfortunately, such modifications undermine the applicability of the VNM theorem, which implicitly assumes that the source of probabilities itself is insignificant to the outcomes for the agent. Luckily, Bolker⁴ has divised an axiomatic theory whose theorems will apply without such assumptions, at the expense of some uniqueness results. I’ll have another occasion to post on this later.
I don’t know if author has made further comment on this. I don’t think this undermines the applicability of VNM. If the agent cares whether the car was assigned via a coin toss, then the relevant consequences aren’t just S and D, but richer outcomes like S′ = “son gets car via coin toss” and D′ = “daughter gets car via coin toss.” In that case, the original model just used too coarse a consequence space; VNM can still be applied to lotteries over the refined outcomes. What would challenge VNM is insisting that two lotteries over the same fully specified outcomes can still differ in value purely because of how the probabilities are generated. However, if we assume a deterministic universe, we are allowed to expand the outcome space indefinitely until there is no probability involved, so I’m having a hard time imagining such a scenario.

A “Scaling Monosemanticity” Explainer

latterframe and TheoR

29 Jun 2024 17:50 UTC

10 points

0 comments3 min readLW link

TheoR

A “Scal­ing Monose­man­tic­ity” Explainer

A “Scaling Monosemanticity” Explainer