AI Safety Engineer at Palisade Research.
TheoR
Here is the specific confusion that matters for our purposes. When someone says “a rational agent maximizes expected utility,” this sounds, to a casual listener, like it means “a rational agent computes the probability-weighted average of their subjective values across all possible outcomes.” In other words, it sounds like the agent takes f1, the function representing how good each outcome feels or how much they value it, and averages it across possible worlds, weighted by probability. This would mean that the agent literally values a gamble at the weighted sum of how much they value each possible result.
This seems untrue. “a rational agent computes the probability-weighted average of their subjective values across all possible outcomes.” isn’t the same as agent taking expected value of f1. Expected value of f1 doesn’t carry any meaning at all — it is ordinal, not cardinal. I could prefer two apples to one apple only slightly, but f1(two apples) would be extremely larger than f1(one apple), without violating any Debreu’s theorems.
What this actually says is that agents takes f2, and takes expected value of it across all possible outcomes. This is exactly what VNM agent does per the original theorem, and it is true, per my understanding, that agents “value gambles at the weighted sum of how much they value each possible result”.
I think the most natural fix within the VNM theory is to just say S’ and D’ are the events “car is awarded so son/daughter based on a coin toss”, which are slightly better than S and D themselves, and that F is really 0.5S’ + 0.5D’. Unfortunately, such modifications undermine the applicability of the VNM theorem, which implicitly assumes that the source of probabilities itself is insignificant to the outcomes for the agent. Luckily, Bolker4 has divised an axiomatic theory whose theorems will apply without such assumptions, at the expense of some uniqueness results. I’ll have another occasion to post on this later.
I don’t know if author has made further comment on this. I don’t think this undermines the applicability of VNM. If the agent cares whether the car was assigned via a coin toss, then the relevant consequences aren’t just S and D, but richer outcomes like S′ = “son gets car via coin toss” and D′ = “daughter gets car via coin toss.” In that case, the original model just used too coarse a consequence space; VNM can still be applied to lotteries over the refined outcomes. What would challenge VNM is insisting that two lotteries over the same fully specified outcomes can still differ in value purely because of how the probabilities are generated. However, if we assume a deterministic universe, we are allowed to expand the outcome space indefinitely until there is no probability involved, so I’m having a hard time imagining such a scenario.
Can someone clarify this passage to me? I find myself increasingly confused. Earlier, we assume agent can form a plan: “if the coin comes up heads (no C), I will choose A, if coin comes up tails, I will choose B (with C)”. How can I be money pumped? I don’t violate dynamic consistency nor do I violate consequentialism. Yet I violate independence, and can’t be money pumped. I can’t be convinced to pre-commit to either B or A, since there are no predictors involved, and I can just postpone my actual choice.
Edit: Actually, I don’t violate independence either, these are simply different outcomes. So I don’t understand this argument at all.