On The Independence Axiom
The Fifth Fourth Postulate of Decision Theory
In 1820, the Hungarian mathematician Farkas Bolyai wrote a desperate letter to his son János, who had become consumed by the same problem that had haunted his father for decades:
“You must not attempt this approach to parallels. I know this way to the very end. I have traversed this bottomless night, which extinguished all light and joy in my life. I entreat you, leave the science of parallels alone… Learn from my example.”
The problem was Euclid’s fifth postulate, the parallel postulate, which states (in one of its equivalent formulations) that through any point not on a given line, there is exactly one line parallel to the given one. For over two thousand years, mathematicians had felt that something was off about it. The other four were short, crisp, self-evident: you can draw a straight line between any two points, you can extend a line indefinitely, you can draw a circle with any center and radius, all right angles are equal. The fifth postulate, by contrast, was long, complicated, and felt more like a theorem that ought to be provable from the others than a foundational assumption standing on its own. Generation after generation of mathematicians attempted to derive it from the remaining four and failed.
Farkas Bolyai begged his son to stay away.
János ignored his father’s advice, but not in the way Farkas feared. Instead of trying to prove the postulate, he asked a question that turned the entire enterprise upside down: what happens if the postulate is simply false? What if you can draw more than one parallel line through a point? Rather than deriving a contradiction (which would have constituted a proof of the fifth postulate by reductio), he found something a perfectly consistent geometry, as internally coherent as Euclid’s, just describing a different kind of space. Lobachevsky independently reached the same conclusion around the same time. The parallel postulate was not wrong, exactly, but it was not necessary. It was one choice among several, and the other choices led to geometries that were not merely logically valid but turned out, a century later, to describe the actual physical universe better than Euclid’s flat space ever could.
Roughly two centuries later, people were discussing decision theories and axioms of expected utility. The standard argument went roughly like this: rational agents must maximize expected utility. The von Neumann-Morgenstern theorem proves it. If your behavior violates the axioms, you can be Dutch-booked, turned into a money pump, exploited by anyone who notices the inconsistency. You don’t want to be a money pump, do you? Then you must maximize expected utility. QED.
There are four axioms in the von Neumann-Morgenstern framework: completeness, transitivity, continuity, and independence. Three of them are relatively uncontroversial. The fourth, independence, does enormous structural work; it is the axiom that forces preferences to be linear in probabilities, which is mathematically equivalent to requiring that preferences be representable as the expected value of a utility function.[1] Without independence, you still have a well-defined preference functional (by Debreu’s theorem, given the other axioms), you can still order outcomes, you can still make consistent choices, but you are no longer constrained to maximize expected utility specifically. Independence is the fifth postulate of decision theory.
And just as with Euclid’s fifth, I believe, the resolution is not to keep trying harder to justify it but to ask: what happens when we drop it? What does the resulting decision theory look like? Is it consistent? Is it useful? Does it perhaps describe actual rational behavior better?
The answer, I will argue, is yes on all three counts. Dropping independence does not lead to irrationality or exploitability. Several well-known alternatives to expected utility theory exist precisely because they relax independence, and they do so for a reason. Ergodicity economics, in particular, offers a principled and parsimonious replacement that derives the appropriate evaluation function from the dynamics of the stochastic process the agent is embedded in, rather than postulating an ad hoc utility function and taking its expectation. And the LessWrong community’s own research into updateless decision theory has been converging on the same conclusion from a completely different direction: that the most reflectively stable agents may be precisely those who violate the independence axiom.
A Tale of Two Utilities
Before we get to the main argument, we need to clear up a terminological confusion that silently corrupts reasoning about decision theory on the most trivial level. The word “utility” refers to two completely different mathematical objects, and the fact that they share a name is sad. This is well known in decision theory and you are welcome to skip this section if you know what I am talking about.
The first object is what we might call preference utility, or f1. This is the function that economists use in consumer theory to represent your subjective valuation of bundles of goods under certainty. If you are indifferent between (2 oranges, 3 apples) and (3 oranges, 2 apples), then f1 is constructed so that f1(2,3) = f1(3,2). The crucial property of f1 is that it is ordinal: the only thing that matters is the ranking it induces, not the numerical values it assigns. If f1 assigns 7 to bundle A and 3 to bundle B, all that means is that you prefer A to B. You could replace f1 with any monotonically increasing transformation of it (squaring it, taking its exponential, adding a million) and it would represent exactly the same preferences. The numbers themselves carry no information beyond the ordering.
The second object is von Neumann-Morgenstern utility, or f2. This is the function that appears inside the expectation operator in expected utility theory. It is constructed not from your preferences over certain bundles but from your preferences over lotteries, over probability distributions on outcomes. The vNM theorem says: if your preferences over lotteries satisfy the four axioms, then there exists a function f2 such that you prefer lottery A to lottery B if and only if E[f2(A)] > E[f2(B)]. Unlike f1, f2 is cardinal: it is defined up to affine transformation (you can multiply it by a positive constant and add any constant, but that’s all). Its curvature carries real information, specifically about your attitudes toward risk. A concave f2 means you are risk-averse; a convex one means you are risk-seeking. This curvature is not a feature of f1 at all, because f1 is defined up to arbitrary monotone transformation, which can make the curvature anything you want.
Now, f2 must agree with f1 on one thing: the ranking of certain (degenerate) outcomes. If you prefer bundle A to bundle B with certainty, then f2(A) > f2(B), just as f1(A) > f1(B). But f2 contains strictly more information than f1. It tells you not just that you prefer A to B, but how much you prefer A to B relative to other pairs, in the precise sense that these ratios of differences determine what gambles you would accept. f1 says nothing about gambles at all.
This distinction is treated in the theoretical literature (see e.g. Mas-Colell, Whinston, and Green, Microeconomic Theory, Chapter 6, which makes the distinction explicit, or Kreps, Notes on the Theory of Choice, which provides a particularly careful treatment). But in practice, in textbooks, in casual discussion, the two get conflated constantly. People say “utility function” without specifying which one they mean, and the ambiguity does real damage.
Here is the specific confusion that matters for our purposes. When someone says “a rational agent maximizes expected utility,” this sounds, to a casual listener, like it means “a rational agent computes the probability-weighted average of their subjective values across all possible outcomes.” In other words, it sounds like the agent takes f1, the function representing how good each outcome feels or how much they value it, and averages it across possible worlds, weighted by probability. This would mean that the agent literally values a gamble at the weighted sum of how much they value each possible result.
But this is only true if f1 and f2 are the same function, and they are generally not. They coincide only in the special case where the agent’s risk attitudes happen to perfectly match the curvature of their subjective value function (which implies, also, that we now turn ordinal f1 into something cardinal, so that it reflects not only relative ordering of preferences but something like quantifable subjective values), which is to say, only when the agent treats each possible world as independently valuable and sums across them with no regard for the structure of the gamble as a whole. There is no reason to expect this, and empirically it does not hold.
Why does this matter for what follows? Because before addressing serious arguments for EUT, I want to address “Argument 0”—that EUT is good because it averages subjective utilities over possible worlds, for it is invalid.
Independence Is Sufficient but Not Necessary for Avoiding Exploitation
The strongest case for independence
Let’s steelman the argument for the independence axiom. The best argument does not come from raw intuition (“of course irrelevant alternatives shouldn’t matter!”), but, in my view, from a 1988 result by Peter Hammond, and it goes like this.
Consider an agent facing a decision that unfolds over time, in stages. At stage one, some uncertainty is resolved (say, a coin is flipped). Depending on the result, the agent proceeds to stage two, where they must choose between options. Before any uncertainty is resolved, the agent can form a plan: “if the coin comes up heads, I will do X; if tails, I will do Y.” Hammond showed that if you accept two properties of sequential decision-making, then you are logically forced to satisfy the independence axiom.
The first property is dynamic consistency: whatever plan you make before the uncertainty is resolved, you actually follow through on once you arrive at the decision node. Your ex ante plan and your ex post choice agree.
The second property is consequentialism (in the decision-theoretic sense, not the ethical one): when you arrive at a decision node, your choice depends only on what is still possible from that node forward.
If you accept both properties and you violate independence, you can be money-pumped. Here is how it works, concretely. Suppose your preference between gambles A and B depends on what the common component C is (as the independence axiom says it shouldn’t). Before the uncertainty resolves, you evaluate the compound lottery holistically and prefer the plan involving B (because, in combination with the C branch, B produces a better overall distribution). But then the coin comes up heads, the C branch is now off the table, and you find yourself choosing between A and B in isolation. Consequentialism says you should evaluate based on what’s still possible. And in isolation, you prefer A. So you switch from your plan (B) to your current preference (A). You are dynamically inconsistent.
A clever adversary who knows your preferences can now exploit this. They offer you a sequence of trades: pay a small amount to switch from plan-A to plan-B before the coin flip (because ex ante you prefer B in context), then after the coin lands heads, pay a small amount to switch from B to A (because ex post you prefer A in isolation). You have paid twice and ended up exactly where you started.
Sufficiency, not necessity
The argument above is valid. But notice its logical structure very carefully. Hammond proved:
Dynamic consistency + Consequentialism → Independence
This means independence is entailed by the conjunction of dynamic consistency and consequentialism. It does not mean independence is the only way to avoid money pumps. Dynamic consistency alone is what prevents exploitation (if you always follow through on your plans, no one can pump you by getting you to switch mid-stream). And the Hammond result shows that dynamic consistency together with consequentialism implies independence, but this leaves open a crucial possibility: what if you maintain dynamic consistency while giving up consequentialism?
In that case, you can violate independence and still be immune to money pumps. The money pump relies on a specific sequence of events: first, you form a plan; then, partway through, you deviate from it because your local evaluation at the intermediate node (which, under consequentialism, ignores the branches that didn’t happen) differs from your global evaluation when you made the plan. If you simply don’t deviate, if you stick to your plan regardless of what your local preferences at intermediate nodes might suggest, the pump has no lever to pull. The adversary offers you a trade mid-stream, you say “no, I committed to a plan and I’m executing it,” and the pump breaks down.
It is a well-developed position in decision theory, and it comes in (at least) two flavors.
Resolute choice
Edward McClennen developed the theory of resolute choice in his 1990 book Rationality and Dynamic Choice. The idea is straightforward: an agent evaluates the entire decision tree before any uncertainty is resolved, selects the plan that is globally optimal over the full trajectory, commits to it, and then executes it step by step without re-evaluating at intermediate nodes.
The cost is giving up consequentialism. At some intermediate node, the resolute chooser may be executing an action that looks suboptimal if you consider only what’s still possible from that node forward. They are choosing it because it was part of the globally optimal plan, and the globally optimal plan was evaluated over the entire tree, including branches that, at this point, have already been resolved.
Is this “irrational”? I don’t think so. It is the same thing that anyone does when they follow through on a commitment that has become locally costly but was globally optimal at the time it was made.
Sophisticated choice
There is a second alternative, which goes in a different direction.[2] A sophisticated chooser accepts that their preferences at future nodes will differ from their current global evaluation, and instead of committing to override those future preferences, they predict them and plan around them. They do backward induction: starting from the last decision node, they figure out what they would actually choose there (given their local preferences at that node), then step back one node and choose optimally given what they know they will do later, and so on back to the first node.[3]
The sophisticated chooser is also immune to money pumps, because they never form a plan that they will later deviate from. Instead, they form a plan that already accounts for their future deviations. The cost is different from resolute choice: instead of sticking to a globally optimal plan despite local temptation, the sophisticated chooser settles for a plan that may be dominated from the ex ante perspective but is at least self-consistent in the sense that they will actually follow through on it.
Sophisticated choice is less elegant than resolute choice, and for our purposes less interesting, but it is worth mentioning because it demonstrates the same structural point: money-pump immunity does not require independence. It just requires some form of sequential coherence (either commitment or self-prediction), and independence is only one way to get there, indeed the most restrictive way.
Ergodicity economics as a naturally resolute framework
I closely follow the work of Ole Peters and collaborators and believe it is very cool. There is, unfortunately, a lot of confusion, and it is not framed in terms of decision theory apparatus, but that is what I am going to do precisely now.
An agent who maximizes the time-average growth rate of their wealth over their entire trajectory is, I claim, doing resolute choice in the sense McClennen described. They evaluate the whole plan, the entire sequence of bets, the complete wealth process, as a unified object. They ask: “given the dynamics of this stochastic process, what strategy maximizes my long-run growth rate?” And then they execute that strategy.
The “utility function” that falls out of this procedure (via the ergodic mapping, which finds the transformation that renders the wealth process ergodic, so that time and ensemble averages coincide) depends on the dynamics of the process. For multiplicative dynamics, you get logarithmic utility (the Kelly criterion). For additive dynamics, you get linear utility. For more exotic dynamics, you get whatever transformation the ergodic mapping produces. This means the effective utility function is context-dependent: it changes when the stochastic environment changes. And context-dependence of the utility function is precisely what the independence axiom forbids, because independence says your preference between sub-gambles should not depend on what else is in the package.
So the EE agent violates independence. But are they exploitable? No. And the reason maps exactly onto the resolute choice framework. The EE agent has committed to a trajectory-level optimization: maximize time-average growth. They don’t re-evaluate at intermediate nodes by asking “given that this branch of the uncertainty has been resolved, what do my local preferences say?” They continue executing the trajectory-level strategy because it was derived from a global evaluation of the entire process. The money pump has no leverage because there is no gap between the agent’s ex ante plan and their ex post behavior. They planned to Kelly-bet (or whatever the ergodic mapping prescribes), and they are Kelly-betting, regardless of what the local branch structure looks like at any given moment.
This connection between ergodicity economics and resolute choice has not been articulated before. But it is, I think, the cleanest way to see why EE can violate independence without irrationality.
Now, you may or may not accept the entire EE program, but, at the very minimum, I think the conclusion that the agent should pay attention to the dynamic of the gamble and that concrete “utility function” should depend on the gamble is undeniably valid.
The broader landscape
The fact that independence is the weak point of the vNM framework is reflected in the structure of the entire field of generalized decision theory, where the majority of alternative frameworks are built specifically by relaxing or replacing the independence axiom:
Rank-dependent utility (Quiggin, 1982) replaces independence with “comonotonic independence” (independence holds only for gambles that rank outcomes in the same order). The result is a preference functional that includes a probability weighting function, which distorts the cumulative distribution before integrating against the utility function.
Cumulative prospect theory (Tversky and Kahneman, 1992) combines probability weighting with reference-dependence and loss aversion. It was developed to explain empirical patterns of choice under risk, and it violates independence in multiple ways.
Quadratic utility (Chew, Epstein, and Segal) allows the preference functional to be a bilinear form in probabilities, meaning it is quadratic rather than linear in the probability measure. This captures something akin to sensitivity to the variance of a gamble, not just its mean.
Betweenness preferences (Dekel, 1986; Chew, 1989) weaken independence to the requirement that if you are indifferent between two lotteries, any mixture of them is equally good. This is strictly weaker than full independence and yields preference functionals defined by implicit functional equations rather than explicit integrals.
This convergence is not coincidental. When multiple independent research programs, developed by different people with different motivations over several decades, all arrive at the same structural move (relax independence), it suggests that the constraint being relaxed is objectively too strong.
Allais and Ellsberg Behavior Is Rational
Allais Paradox
The Allais paradox is the oldest and most famous demonstration that people systematically violate the independence axiom. The setup, in its simplified form, goes like this.
In situation one, you choose between gamble A (certainty of one million euros) and gamble B (89% chance of one million, 10% chance of five million, 1% chance of nothing). Most people choose A.
In situation two, you choose between gamble C (11% chance of one million, 89% chance of nothing) and gamble D (10% chance of five million, 90% chance of nothing). Most people choose D.
But the move from situation one to situation two is exactly a common-consequence substitution: you strip out the same 89% component from both options in each pair. Independence says this shouldn’t change your preference, so if you chose A over B, you should choose C over D. People do the opposite, and this is treated as evidence of irrationality, a “paradox” revealing that human risk cognition is systematically biased.
I want to argue that it is not a paradox at all. It is rational behavior that only looks paradoxical if you insist on evaluating each branch of a lottery independently of every other branch, which is exactly what the independence axiom demands and exactly what a holistic reasoner should not do.
Consider why people choose A in situation one. The certainty of one million is qualitatively different from a 99% chance of getting at least one million with a 1% chance of getting nothing. That 1% of nothing looms large because of what it means in context: you are giving up a sure million for a gamble that could leave you with nothing. The certain outcome provides a floor, a guaranteed trajectory, and evaluating the gamble requires considering what happens along the entire trajectory, including the branch where you get nothing while knowing you could have had a certain million.
Now consider situation two. Both options involve a high probability of getting nothing. There is no certainty to give up, no floor to sacrifice. The context has fundamentally changed: you are already in a world where you will probably get nothing, and the question is just whether to take a slightly higher probability of a moderate payout or a slightly lower probability of a much larger one. In this context, going for the higher expected value is sensible.
The shift from A-over-B to D-over-C is a rational response to the fact that the overall risk structure of the gamble has changed. The “common component” (the 89% that was stripped out) was not psychologically or strategically inert: in situation one, it was providing certainty; in situation two, it was providing nothing. Stripping it out changed the context in which the remaining options are evaluated, and a holistic reasoner, one who evaluates their total exposure rather than decomposing gambles into independent branches, should respond to that change.
This is precisely the point we made with the example in the introduction to section 3. If the common component C is a large safety net, you can afford to take more risk on the remaining branch. If C is negligible, you should be more conservative. Your preference between A and B should depend on what else is in the package, because you are one agent facing the total distribution, not a collection of independent sub-agents each evaluating one branch in isolation.
The important distinction here is between the descriptive claim and the normative claim. The descriptive claim (people violate independence in the Allais pattern) has been known since 1953 and is not controversial. What is usually controversial is the normative status of this behavior. The standard treatment in economics and in much of the rationality community says: people violate the axiom, this is a bias, ideally they should be corrected. The position I am defending is the opposite: people violate the axiom because the axiom is too strong, their behavior reflects a rational holistic evaluation of the gamble’s structure, and the “correction” (forcing independence-compliant preferences) would make them worse decision-makers, not better ones.
Ellsberg Paradox
The Ellsberg paradox involves a related but distinct phenomenon: ambiguity aversion. The classic setup: an urn contains 30 red balls and 60 balls that are either black or yellow in unknown proportion. You can bet on the color of a drawn ball. Most people prefer betting on red (known probability of 1⁄3) over betting on black (unknown probability, could be anything from 0 to 2⁄3), even though if you assign your best-estimate probability of 1⁄3 to black, the expected values are identical. This is typically treated as another “irrational” bias: the probabilities are the same in expectation, so why should ambiguity matter?
Ergodicity economics provides a natural and I think quite elegant resolution, and it comes in two layers.
The first layer is a direct Jensen’s inequality argument. Under multiplicative dynamics, the time-average growth rate of a repeated gamble is a concave function of the probability. For a simple multiplicative bet with fraction f of wealth wagered, the growth rate is something like g(p) = p·log(1+f) + (1-p)·log(1-f), which is concave in p.
Now consider the Ellsberg urn. The number of black balls could be 0, 1, 2, …, 60. If you are maximally uncertain and average uniformly over these possibilities, the expected proportion is 30⁄60 = 1⁄2, which matches the known-probability case. An ensemble-average reasoner sees no difference: E[p] = 1⁄2 in both cases, so the expected value of the gamble is the same.
But concavity of g in p means that Jensen’s inequality applies:
E[g(p)] < g(E[p])
The average time-average growth rate across all possible urn compositions is strictly less than the time-average growth rate you get when the probability is known to be 1⁄2. Each distinct urn composition (0 black balls, 1 black ball, 2 black balls, and so on) defines a different multiplicative process with a different time-average growth rate. You can compute all 61 of these growth rates and average them, and that average will be strictly lower than the single growth rate corresponding to the known 1⁄2 probability, because you are averaging a concave function. The gap is mathematically inevitable, and it is completely invisible to ensemble averaging.
The second layer is about strategic optimality. Even beyond the Jensen’s inequality point, an agent under multiplicative dynamics has a further reason to prefer known probabilities: strategy calibration. The optimal strategy (the Kelly fraction, or more generally whatever the ergodic mapping prescribes) depends on the probabilities. When probabilities are known, you can tune your bet size precisely and achieve the optimal time-average growth rate. When probabilities are ambiguous, you cannot.
The Kelly criterion is uniquely optimal: any deviation from the correct Kelly fraction, whether you bet too aggressively or too conservatively, strictly reduces the time-average growth rate. If the true probability of black is 1⁄6 and you bet as if it were 1⁄3, you are over-betting and your growth rate suffers. If the true probability is 1⁄2 and you bet as if it were 1⁄3, you are under-betting and your growth rate also suffers, less dramatically but still measurably. Regardless of what the true probability turns out to be, as long as it differs from your point estimate, your trajectory-level performance is strictly worse than what you could have achieved with known probabilities.
So the agent who prefers known probabilities is, in effect, saying: “I want to be able to optimize my strategy for the actual stochastic process I am embedded in, and I can only do that if I know the parameters of that process.”
How LessWrong Has Engaged with This
The LessWrong community has discussed the independence axiom and related questions multiple times over the past fifteen years, and the landscape is instructive. The pieces are mostly there: the right questions have been asked, the right concerns have been raised, and in one remarkable comment, the right conclusion has been stated almost verbatim. But the pieces have never been assembled into a unified argument.
Armstrong’s “Expected Utility Without the Independence Axiom” (2009)
Stuart Armstrong’s post is, to my knowledge, the earliest serious treatment of dropping independence on LessWrong, and it gets a lot right. Armstrong correctly identifies independence as the most controversial vNM axiom and explores what kind of decision theory remains when you drop it. This was valuable groundwork, and it is to Armstrong’s credit that he took the question seriously at a time when the LessWrong consensus was (and to a significant extent still is) that violating any vNM axiom is ipso facto irrational.
However, Armstrong reaches one conclusion that I think is wrong. His central result is that when an agent faces many lotteries, and those lotteries are independent and have bounded variance, the agent’s aggregate behavior converges to expected utility maximization even without the independence axiom. He writes: “Hence the more lotteries we consider, the more we should treat them as if only their mean mattered. So if we are not risk loving, and expect to meet many lotteries with bounded SD in our lives, we should follow expected utility.”
This is a correct result within its assumptions, but the assumptions exclude exactly the cases where abandoning independence matters most. Armstrong’s convergence argument relies on two things: that the lotteries are independent of each other, and that they aggregate additively (so that the law of large numbers, in its standard additive form, applies to their sum). Under these conditions, yes, the variance of the aggregate shrinks relative to the mean, and the mean dominates, which is equivalent to expected utility maximization.
But for an agent making sequential decisions where wealth compounds multiplicatively, the aggregation is not additive. The relevant law of large numbers for multiplicative processes concerns the geometric mean, not the arithmetic mean. And the geometric mean of a set of multiplicative gambles is determined by the time-average growth rate (the expected logarithm of the growth factor), not by the expected value. The convergence is to the time average, not the ensemble average. The same line of reasoning can be applied to any non-additive (so not only multiplicative) gamble.
Scott Garrabrant’s comment (2022) — Updatelessness and independence
In December 2022, Scott Garrabrant left a comment beneath a post on the EUT that I consider one of the most important things written on LessWrong in the context of this question. I want to quote the core of it and then explain why it matters for my argument.
Garrabrant wrote:
My take is that the concept of expected utility maximization is a mistake. [...] As far as I know, every argument for utility assumes (or implies) that whenever you make an observation, you stop caring about the possible worlds where that observation went differently. [...] Von Neumann did not notice this mistake because he was too busy inventing the entire field. The point where we discover updatelessness is the point where we are supposed to realize that all of utility theory is wrong. I think we failed to notice.
The argument, unpacked, goes like this. The vNM framework, and every axiomatization of utility that Garrabrant is aware of, implicitly assumes updating: when you observe something (say, a coin comes up heads), you condition on that observation and from that point forward you only care about worlds consistent with it. The worlds where the coin came up tails are discarded from your deliberation. This is Bayesian updating applied to preferences, not just beliefs, and it is so deeply embedded in the framework that it is usually invisible.
But the LessWrong/MIRI decision theory research program discovered, through work on Updateless Decision Theory and its successors, that updating is not a requirement of rationality. An updateless agent does not narrow its caring when it makes an observation.
Now here is the connection which is the reason I am presenting Garrabrant’s comment at length.
The updating step that Garrabrant identifies as the hidden assumption in utility theory is, formally, the same thing as the branch-by-branch evaluation that the independence axiom encodes. When you update on “the coin came up heads,” you evaluate your remaining options conditional on this observation, ignoring the tails branch. Independence says this conditional evaluation should be the same regardless of what was on the tails branch, precisely because you are supposed to discard the tails branch after updating. An updateless agent, by contrast, evaluates the entire policy (covering both heads and tails) as a single object, and the value of the heads-branch action depends on what the tails-branch action is, because both are part of the same globally optimized policy.
This is structurally parallel to the EE critique: the time-average reasoner evaluates the entire trajectory (all branches, the full compounding structure) as a unified object, rather than decomposing it into independent branches and evaluating each one after updating on which branch was realized. The EE agent is, in Garrabrant’s terminology, updateless with respect to the temporal unfolding of their wealth process.
Two completely independent lines of thought, one coming from physics and the mathematics of stochastic processes, the other coming from the philosophical and logical analysis of decision theory within the rationalist community, converge on the same structural conclusion: the independence axiom encodes a branch-by-branch, post-update evaluation that is not required by rationality, and the most reflectively coherent agents are those who evaluate holistically rather than branch-by-branch.
Academian’s “VNM Expected Utility Theory: Uses, Abuses, and Interpretation” (2010)
Academian’s post covers a lot of ground, but the section relevant to our discussion is section 5, titled “The independence axiom isn’t so bad.”
Academian’s defense of independence rests on what he calls the Contextual Strength (CS) interpretation of vNM utility. The idea is that vNM-preference should be understood as “strong preference” within a given context of outcomes. When the vNM formalism says you are indifferent between two options (S = D in the parent-giving-a-car-to-children example), this does not mean you have no preference at all. It means you have no preference strong enough that you would sacrifice probabilistic weight on outcomes that matter in the current context in order to indulge it. Under this interpretation, the independence axiom’s requirement that S = D implies S = F = D (where F is the coin-flip mixture) just means you wouldn’t sacrifice anything contextually important to get the fair coin flip over either deterministic option. You can still prefer the coin flip in some weaker sense; you just can’t prefer it strongly enough to trade off against the things that actually matter.
I acknowledge that this is a well-crafted defense, and Academian is admirably honest about most of its limitations. But the CS defense has a critical limitation that Academian does not address: it works only for small, contextually negligible independence violations. The parent-and-car example involves a marginal preference for fairness that is, as Academian argues, plausibly too weak to warrant probabilistic sacrifice in a context that includes weighty outcomes. Fine. But the independence violations that arise in the settings this article is concerned with are not marginal at all.
Consider again the gamble example from section 3. You are choosing between gambles A and B, and the common component C is either a large safety net (ten million euros) or a trivial amount (five euros). Your preference between A and B flips depending on what C is: with the large safety net, you take the risky option; without it, you take the safe one. This is not a whisper of a preference that disappears when larger considerations are in play, but a robust, large-magnitude shift in risk strategy driven by the structural properties of your total exposure. The CS interpretation cannot accommodate this, because the whole point of CS is that independence violations are contextually negligible, and in the cases that matter for EE and for real-world sequential decision-making, they are anything but.
Fallenstein’s “Why You Must Maximize Expected Utility” (2012)
Benja Fallenstein’s post is the most rigorous and carefully argued defense of expected utility maximization on LessWrong, and it is the one that most directly claims what the title says: that you must maximize expected utility. If the argument of this article is correct, Fallenstein’s post is where the disagreement is sharpest.
Fallenstein’s setup is this. You have a “genie,” a perfect Bayesian AI, that must choose among possible actions on your behalf. The genie comprehends the set of all possible “giant lookup tables” (complete plans specifying what to do in every conceivable situation) and selects the one that best satisfies your preferences. Preferences are defined over “outcomes,” which are data structures containing all and only the information about the world that matters to your terminal values. The genie evaluates probability distributions over these outcomes.
Within this setup, Fallenstein argues for independence by analogy with conservation of expected evidence. She writes: “The Axiom of Independence is equivalent to saying that if you’re evaluating a possible course of action, and one experimental result would make it seem more attractive than it currently seems to you, while the other experimental result would at least make it seem no less attractive, then you should already be finding it more attractive than you do.” He then addresses the parent/car/coin counterexample by arguing that if you care about the randomization mechanism, this should already be encoded in the outcome, not in the preference over lotteries.
This is a strong argument, and it is correct within its setup. If you accept the timeless-genie framing, where a perfect Bayesian evaluates all possible world-histories simultaneously and chooses among complete plans from a god’s-eye view, then independence is very nearly trivially true. The genie faces a single, static decision over probability distributions. There is no temporal sequence, no compounding, no intermediate node at which the genie might re-evaluate. The genie simply picks the best plan, and the best plan is the one whose probability distribution over outcomes ranks highest. In this setting, asking whether the “common component” should influence the evaluation is like asking whether an irrelevant column in a spreadsheet should affect which row you pick: obviously not, because you’re evaluating the whole row at once.
But the force of this argument depends entirely on whether you accept the timeless-genie framing as the correct idealization of rational decision-making. And this is precisely what ergodicity economics and the updatelessness research program both call into question.
The genie exists outside of time. It surveys the entire space of possible histories from above, assigns probabilities, and computes weighted sums. This is the ensemble-averaging perspective, formalized as a decision procedure. And it is a perfectly coherent idealization, one of the possible “geometries” of decision theory. But it is not the only one. An agent who is embedded in a temporal process, who faces sequential decisions with compounding consequences, who cannot step outside of time and evaluate all histories simultaneously, lives in a different geometry. For this agent, the temporal structure of the process, the order in which decisions are made, the way outcomes compound, the path-dependence of wealth dynamics are the central features of the decision problem.
Fallenstein’s argument shows that if you accept the timeless-genie setup, you get expected utility maximization, not that you must accept the timeless-genie setup. The question that EE raises, whether a temporally embedded agent facing sequential compounding decisions should evaluate trajectories holistically rather than decomposing them into independent branches, falls entirely outside Fallenstein’s framing. It is not addressed because it cannot be addressed within that framing, just as questions about the curvature of space cannot be addressed within Euclidean geometry. You need a different geometry to even ask them.
Just Give Up on EUT
I think we just need to abandon EUT, once and for all. It is bad to describe humans, bad to describe AIs, and bad to describe potential superintelligences.
The argument for this conclusion has three legs, and I want to make sure all three are visible:
1. Theoretical. The independence axiom is sufficient but not necessary for avoiding Dutch book exploitation.
2. Empirical. The Allais paradox, the Ellsberg paradox, and the general instability of estimated risk-aversion parameters across contexts are not bugs in human cognition that education or debiasing should correct, but features which are exactly what you would expect from agents who evaluate their total risk exposure holistically rather than branch by branch.[4]
3. Convergence of independent research programs. The ergodicity economics and the updateless decision theory program, independently and from completely different starting points, converge on the same structural insight: the branch-by-branch, post-update evaluation that the independence axiom encodes is just one possible rational way to face uncertainty, and there are others.
The rationalist community has an enormous intellectual investment in expected utility maximization. It is woven into the foundations of how this community thinks about decision theory, about AI alignment, about what it means for an agent to be rational. Eliezer’s sequences treat EU maximization as nearly axiomatic. The VNM theorem is invoked routinely as a constraint on what rational agents can look like. A great deal of alignment-relevant reasoning (about corrigibility, about value learning, about what kinds of objective functions a superintelligent agent would have) implicitly assumes that sufficiently rational agents are EU maximizers. Hence, there is even more grace in pivoting from EUT and acknowledging the problems with the independence axiom.
János Bolyai wrote to his father: “Out of nothing I have created a strange new universe.”
We shouldn’t be afraid to the same in decision theory.
- ^
[Note added after publication] For readers unfamiliar with the independence axiom: it says that if you prefer lottery A over lottery B, then you should also prefer “some chance of A, otherwise C” over “same chance of B, otherwise C,” regardless of what C is. In other words, mixing both options with the same common component C, at the same probability, should never change which one you prefer. Or, more intuitively: your preference between two gambles should depend only on where they differ, not on what they have in common.
- ^
The terminology “sophisticated choice” itself was consolidated by Hammond (1976) and especially by McClennen (1990), who contrasted it explicitly with resolute choice. So, if you want to read some intro about it, McClennen’s book (and not the original papers on what will be known as sophisticated choice which are linked in the section on sophisticated choice) is actually the key source that sets up the three-way taxonomy: naive, sophisticated, and resolute.
- ^
[Note added after publication] Elliott Thornley correctly pointed out that the claim “sophisticated choice avoids money pumps” needs qualification. On the standard definition, an agent is money-pumped if they end up paying for something they could have kept for free, even if no sequential exploitation occurred. By this definition, sophisticated choosers can be money-pumped: because they adjust their plans to accommodate foreseen future deviations rather than committing to the globally optimal plan, they can end up with outcomes that are dominated from the ex ante perspective.
But the resolute chooser is immune to this. They commit to a plan and execute it; no amount of incentive-fiddling at terminal nodes changes their behavior, because they don’t re-derive their strategy from backward induction. So, the contrast between sophisticated and resolute choice is sharper than the initial version of the article suggested: it looks like sophisticated choice is not “a weaker but acceptable alternative to resolute choice”, but is really exploitable in a way that resolute choice is not.
- ^
[Note added after publication] A very recurring theme in the comments, and one that correspondingly deserves explicit treatment, is the following defense of the independence axiom: any apparent violation can be dissolved by enriching the outcome space. If a person exhibits the Allais pattern (preferring certainty in one choice pair but going for higher expected value in a related pair), one can always say: “the person isn’t really choosing between $0 and $1M; they’re choosing between $0-accompanied-by-the-feeling-of-having-missed-a-sure-million and $0-in-a-context-where-nothing-was-guaranteed, and these are different outcomes, so independence isn’t really violated.” The same move is available for any apparent violation: whatever contextual, psychological, or dynamic factor drove the preference reversal can be folded into the outcome description, restoring independence by construction.
It makes the independence axiom unfalsifiable: no possible pattern of observed behavior can count as a violation, because any pattern can be accommodated by a sufficiently enriched outcome space.
There is a structural tension in how the independence axiom is used in practice. There are really two versions of it:
The abstract version defines outcomes so broadly (encoding arbitrary contextual information, psychological states, the mechanism of randomization, the history of the decision process, etc.) that independence holds tautologically. This version is unfalsifiable and therefore normatively inert.
The applied version defines outcomes as the tangible quantities that economists, financial advisors, and rationalists actually care about: monetary payoffs, consumption bundles, lives saved, years lived. This version is substantive, falsifiable, and in fact falsified by the Allais pattern, the Ellsberg pattern, and the general instability of estimated risk preferences across contexts.
The argument of this article is directed at the applied version.
Really nice post. A few things though:
1.
This isn’t right, at least on the usual definition of ‘money pump’ where an agent is money-pumped if and only if they “end up paying for something they could have kept for free even though they knew in advance what decision problem they were facing.” As you say, sophisticated choosers who violate Independence sometimes have to settle for plans that are dominated from the ex ante perspective. That’s a money pump on the usual definition.
2.
It doesn’t seem right to list Quiggin’s rank-dependent theory and Tversky and Kahneman’s cumulative prospect theory as evidence that Independence is normatively too strong, since (IIRC) both are put forward as descriptive models of how humans actually behave, rather than normative models of how they should behave. (That said, Lara Buchak defends rank-dependent theory as a normative model (under the name ‘Risk-Weighted Expected Utility Theory.’))
3.
You don’t really reckon with the arguments against resolute choice. I like Gustafsson’s discussion in chapter 7. A summary: resolute choice either requires acting against your own preferences at the moment of choice (which seems instrumentally irrational) or else modifying your preferences (which is no defence of your original preferences).
4.
I think the Allais argument against Independence doesn’t really work. The Allais preferences can be rational if you’d feel extra disappointed getting $0 when you only had a 1% chance of doing so. But $0-with-extra-disappointment is a different outcome to $0, so those preferences don’t violate Independence!
I strongly agree, and I think that it’s worth emphasizing that people optimize (partially) for their own emotions, and choices which seem irrational when this consideration is neglected can be rational when it is taken into account.
With that being said, there’s still a chance that an Allais-like argument could work.
Let’s imagine a different hypothetical choice:
This has a similar structure to the original Allias choice, but the 1% risk of feeling disappointment from choosing option B is gone because you’ll never find out.
If people still choose A over B and D over C here, then I think that we could conclude that people violate Independence. This is am empirical question; has a study like this ever been done?
(I leave open the question of whether this would be a mark against Independence or a mark against people’s instinctive decision-making.)
That’s a cool idea! I’m not aware of any study like that, but I’d be very interested to see the results.
For what it’s worth, I’d still pick A over B and D over C with that change. I think I kinda compress C and D to “the charity is pretty much not gonna get any money, but on the off chance it does, might as well make it 5x more” but with A and B I still would rather they be able to work with a million rather than risk not getting anything, even if B can be compressed to “they pretty much get a lot of money, with an off chance of 5x, and a fluke chance of nothing”.
I think it might be more a question of how bad is it to not get anything? If the charity was already well funded, maybe even so funded they don’t know what to do with all the money they already have, I’d pick B and D. Likewise if I was a billionaire in the original question, I’d pick B and D. But I’m not and the charity I had in mind is not super well funded, so the cost of no money is too high when comparing A and B.
I can see a problem with the way that I phrased the question here. I wanted an example of something that a person would value and want to make happen, but which they might plausibly not find out about. I wasn’t imagining a specific charity, but I was thinking of something linear in terms of good done per money donated, which would be something large that’s already adequately funded but not saturated. Yet one could imagine a specific charity when answering the question, and the conditions of that charity could affect the shape of the utility-vs-money curve. That means that the question could end up measuring a feature of someone’s contextual utility-vs-money curve instead of measuring their reaction to risk.
I just need an example of something that’s really good, and something else that’s five times as good, both which a person might not find out about. Maybe we could use lives saved—strangers’ lives, and you’ll never find out who—but people could have weird moral intuitions regarding saving lives that distort the results. (There’s a famous example of a framing effect, the ‘Asian disease’ problem, based on this.)
We could stick with the charity example and specify linearity in utility-vs-money, but that wouldn’t be a concise question, and it could be misunderstood.
Does anyone have any better ideas?
Scenario A: your friend gets $10
Scenario B: 89% odds your friend gets $10, 10% odds they get dollars50 (lesswrong fucks the formatting if I use another dollar sign on this line for some reason. I’m on mobile and don’t see an option to change any formatting settings), 1% odds they get nothing
Scenario C: 11% odds your friend gets $10, 89% odds they get nothing
Scenario D: 10% odds your friend gets $50, 90% odds they get nothing
I pick B and D in these, because if my friend gets nothing in any of those scenarios it doesn’t matter. I think it really is an issue where once the guaranteed value in A gets past a certain point, almost any odds of losing it become intolerable. Maybe human values aren’t linear?
edit: and if it’s a well-funded-but-not-saturated charity, I pick B and D too, although if we’re talking about a million and 5 million it’s a tough call.
A potential reframe: certainty has a lot of value. I would not pay $10 for a plane ticket with 10% odds that I actually get to go to the place, because I can’t plan around that effectively, even if the expected value is the same as a ticket that costs dollars100 and takes me with ~100% certainty
Thank you for the careful engagement!
On point 1: You’re right. The more precise statement would be: sophisticated choice avoids being exploited through a sequence of individually-accepted trades, but can still lead to ex ante dominated plans, because the agent adjusts their initial plan to accommodate foreseen future deviations rather than committing to the globally optimal plan. This is a real limitation of sophisticated choice relative to resolute choice, and I will add a note about it to the post.
On point 2: The point I’m making is about the convergence pattern rather than the original intent of any individual theory—the fact that multiple independent research programs, both descriptive and normative, all arrive at the same structural move (relax independence specifically, rather than transitivity or completeness or continuity).
On point 3:
Gustafsson’s dilemma is powerful indeed: at the moment of executing the resolute plan, either you are acting against your current preferences (which seems instrumentally irrational) or you have modified your preferences to align with the plan (in which case you’re not defending your original non-EU preferences, you’ve just adopted different ones).
I think the ergodicity economics framework provides a clean escape from both horns of this dilemma. Consider an EE agent maximizing time-average growth rate over their trajectory. At every node, their preference is the same: execute the strategy that maximizes trajectory-level growth. This preference doesn’t change at intermediate nodes, ever. The appearance of “acting against your preferences at the moment of choice” arises only if you evaluate the agent’s node-level behavior through an EU lens and ask “given that you’re at this node, doesn’t a different action have higher conditional expected utility?” But the agent’s actual preference was never about conditional expected utility at individual nodes. Their preference is about the trajectory as a whole, and that preference is entirely stable throughout the process.
So the EE agent is neither acting against their current preferences (horn a), since their trajectory-level preference consistently favors the same action at every node, nor modifying their preferences (horn b), since their preference was always trajectory-level and never changed. The dilemma’s force, how I see it, depends on assuming that the “real” preferences at any node must be the ones that EU would assign conditional on being at that node. Rejecting that assumption, which is precisely what rejecting the independence axiom amounts to, dissolves the dilemma.
On point 4:
I think we discussed it already in other comments. But in a nutshell, what I mean is that this is a well-known defense, and it has a well-known cost: it makes the independence axiom unfalsifiable. If any apparent violation can be resolved by saying “the outcomes are actually different because of the context-dependent psychological state” (disappointment, regret, elation from near-misses), then no possible pattern of behavior could ever count as a real independence violation. Any behavior whatsoever can be accommodated by enriching the outcome space with the right context-dependent psychological states.
This is fine if we want independence to be a definitional truth, but then it carries no normative force: it cannot tell us that any particular pattern of behavior is irrational, because every pattern can be rationalized by the appropriate outcome redefinition. And it cannot do any predictive or explanatory work, because it accommodates everything and therefore constrains nothing.
This is the motte-and-bailey structure I discuss in the article when talking about Academian’s and Fallenstein’s posts.
Overall: In its most general form (where outcomes can encode arbitrary contextual and psychological information), EUT is unfalsifiable.
In its actual applied form, it is falsified by the Allais pattern.
In case you missed it, Scott Garrabrant also has a post: Geometric Rationality is Not VNM Rational
Thanks!
UDT was (in part) the result of asking this. See my 2009 indexical uncertainty and the Axiom of Independence, which pointed out that indexical uncertainty does not satisfy the Axiom of Independence, and in the comments @Vladimir_Nesov pointed out that this extends to more general kinds of uncertainty. Then UDT was written up 2 months later informed by this discussion.
(That write-up still used expected utility maximization with regard to logical/mathematical uncertainty, with a parenthetical note “This specifically assumes that expected utility maximization is the right way to deal with mathematical uncertainty. Consider it a temporary placeholder until that problem is solved.” AFAIK that problem is still open today. @Scott Garrabrant I think this answers your “weird to have it be formalized in term of expected utility maximization”?)
Nice post!
I want to push on one thing, though. I’m sceptical of the claim that the ergodicity economics agent violates independence.
As I understand it, the EE agent has a fixed objective: maximize the time-average growth rate of wealth, which is equivalent to maximizing expected log terminal wealth. When the stochastic environment changes — say from multiplicative to additive dynamics — the optimal per-bet policy changes. In the multiplicative case, you Kelly-bet (which looks like log utility applied locally). In the additive case with many independent bets, you behave roughly linearly with each bet (because log is approximately linear for small additive increments relative to total wealth).
But is this actually a violation of independence? Independence says: if you prefer lottery A to lottery B, then mixing both with a common lottery C at the same probability shouldn’t reverse that preference. It’s a constraint on your ranking of probability distributions over outcomes.
What the EE agent is doing seems different. They have a fixed preference over (distributions over) outcomes (log terminal wealth, or equivalently, time-average growth rate). When the dynamics change, the mapping from available actions to outcome distributions changes, so the optimal action changes. But the preference ordering over final outcomes hasn’t changed — the agent still prefers higher log wealth to lower log wealth. It’s the decision problem that’s different, not the preferences.
To put it another way: an EU maximizer with log utility would make exactly the same choices as the EE agent in every case you describe. They’d Kelly-bet in multiplicative environments and behave more linearly in additive ones, because that’s what maximizing expected log wealth requires in each setting. But the EU maximizer with log utility satisfies independence by construction. So how can the EE agent be violating independence while making identical choices?
I think the thing that looks like a context-dependent utility function is really a context-dependent policy derived from a fixed utility function under different dynamics. These seem importantly different, and I’m not sure the independence axiom is violated by the latter.
EE isn’t VNM because it violates completeness—there are lotteries that can’t be compared to each other. An example (from here):
There is no single transformation function that makes both of these lotteries ergodic, so EE has no way of saying which is better.
I’m not sure whether EE violates independence; like you, I’m not convinced, but I’d have to think about it more to say with confidence.
Oh, you just apply different ergodic transformations to different lotteries, of course.
Also, beware that besides this wrong example, the linked paper contains other basic misconceptions about EE, like for example the claim that EE is equivalent to log utility.
You have to apply different transformations to different lotteries, because EE requires that all lotteries be transformed such that the result is ergodic. There is no single transformation function that can make a multiplicative lottery ergodic while also making an additive lottery ergodic.
It does not make that claim. The claim was that there are multiple transformation functions that can make multiplicative bets ergodic, but in practice, EE proponents always use the logarithm function, which produces a decision theory that’s equivalent to log utility for the special case of multiplicative bets.
EE has the objective to maximize time-average growth rate, but it is generally not equivalent to maximizing expected log terminal wealth. This is the single most common misunderstanding of ergodicity economics, probably(
That is exactly my point that they are not making identical choices.
For multiplicative dynamics, yes, they coincide exactly. For additive dynamics, they diverge: the log-utility maximizer remains risk-averse (because log has curvature everywhere), while EE prescribes risk-neutrality (linear evaluation) for additive dynamics. The EE agent with an additive gamble would accept bets that the log-utility maximizer would reject.
You say:
In the additive case, you behave within EE framework not roughly linearly, but just linearly, and it is not because log is approximately linear, but just because ergodic mapping is identity mapping in the case of linear dynamics.
And to address your core question of whether independence is actually violated: yes, it is, and here is one simple example similar to the one in the post itself.
Consider two compound lotteries sharing a common component C. Independence says your ranking of the non-common parts should not depend on what C is. But in EE, the ergodic mapping is applied to the dynamics of the entire wealth process, not branch by branch. When C is a multiplicative process, the overall dynamics are multiplicative, the ergodic mapping yields log, and you might prefer gamble A on the non-common branch. When C is an additive process, the overall dynamics are additive, the ergodic mapping yields linear (identity), and you might prefer gamble B on the non-common branch. Your preference between A and B has flipped depending on the common component, which is exactly what independence forbids.
This happens because the ergodic mapping is a global property of the stochastic process. You cannot decompose a compound lottery into branches, apply the ergodic mapping to each branch separately, and recombine.
If this were true, I’d agree that independence isn’t violated.
“Maximize time-average growth rate” is a fixed preference, but it’s a preference over something richer than probability distributions over terminal wealth. It’s a preference over trajectory-classes-indexed-by-dynamics: you’re evaluating not just “what’s the distribution of my final wealth” but “what’s the distribution of my final wealth given the dynamic process I’m embedded in.” And vNM utility is defined over probability distributions over outcomes, not over dynamics-outcome pairs. The EE meta-preference cannot be expressed as a vNM utility function over outcomes because the dynamics are not outcomes, unless you trivially define the dynamics as “outcomes” and then it works “by definition”.
Thanks for the detailed reply! I want to push back on the claim that the EE agent and the log-utility maximizer make different choices
You say that in the additive case, the EE agent is exactly risk-neutral while the log-utility maximizer remains risk-averse. But consider what a log-utility maximizer actually does when facing a long sequence of independent additive bets.
For a single additive bet with payoff x, the contribution to log wealth is ln(W + x) - ln(W) ≈ x/W for small x relative to W. So the log-utility maximizer treats each bet approximately linearly. And as the number of independent additive bets grows, each individual bet becomes smaller relative to total wealth (because wealth is growing via the accumulated positive-EV bets), making the linear approximation increasingly exact.
In the limit of infinitely many independent additive bets, the log-utility maximizer’s per-bet behavior converges to exact risk-neutrality — which is exactly what EE prescribes. So in the regime where the EE prescription is most cleanly motivated (many repeated bets, which is the whole setup for time-average reasoning), the two agents converge.
Where they diverge is for large additive bets relative to current wealth. But here I think the log-utility maximizer is actually more faithful to the “maximize time-average growth rate” objective than the EE agent is. If you’re risk-neutral about an additive bet that could cost you half your wealth, you’re exposing yourself to ruin risk, which destroys your time-average growth rate. The identity mapping tells you the additive process is ergodic, but being ergodic doesn’t mean you should be risk-neutral about large bets within it — not if your goal is to maximize the growth rate of your own single trajectory.
So it seems to me that either (a) the bets are small relative to wealth, and the log-utility maximizer behaves (approximately, converging to exactly) like the EE agent, or (b) the bets are large relative to wealth, and the log-utility maximizer is arguably more correct about maximizing time-average growth. In neither case do the two agents clearly diverge in a way that supports the independence violation argument.
What am I missing?
If I’m right then again it doesn’t seem like the EE agent is really violating independence, given that its behaviour is replicated by a vnm agent that just values its final log wealth
I probably don’t have time to answer to this in detail. The way I see you could understand this is to imaging a completely different dynamic, neither multiplicative nor additive, something weirder, like for example square root or raising to power of 2. For each of that dynamic, EE will give different answers, because ergodic mapping is different for each and also—sometimes—dramatically different from log utility, or linear utility.
But I don’t think EE would give different answers! EE would give the answer that would maximise the time-averaged geometric growth rate. That would be the same answer that would maximise the expected final log wealth after a long series of identical decisions.
So yes, what you say is exactly the misconception about EE. EE will give different answers for each different dynamic, because time-average is calculated differently for each dynamic. Please see for example section 6.6. in the book on EE.
Which of dynamic consistency and consequentialism is not obeyed in EE situations like the one you suggested and how?
My intuition is that
Both are obeyed, and EE can in fact be expressed as a vnm utility function (prove me wrong by answering the above question! It’s an intuition because i haven’t yet been able to pin down how to express EE as a vnm utility function, more below)
Utility theory axioms imply that some utility function exists, not its form.
Said utility function depends on your preferences. If i have a buck and you offer me repeated 50-50 odds of winning 50% of everything or losing 40% of everything its a bad bet—unless this is the only way for me to make money and i want to buy something for two bucks. In this case, a chance is better than nothing.
More generally, EE is specifying a certain set of preferences where having a higher expected wealth is less important than ensuring the growth is measure 1. This is a useful set of preferences in the real world, but neither necessitated nor argued against by the axioms of utility theory.
I expect that if my preferences are about what will happen in the limit of the game, and i care about what happens most of the time, that the utility function will converge to the EE solution, regardless of the specific dynamics. (Is that defining dynamics as outcomes?)
Fascinating post! I really appreciated your articulation of the difference between a utility function as a ordinal or as a cardinal and learning about sophisticated choice which I hadn’t hear of before.
However, I’m skeptical of the ergodicity economics. Utility is mean to be either calculated the the conclusion or cumulatively, not at a midpoint. It seems like this proves too much, you can construct all kinds of weird dynamics, such as having all your money stolen if you make more than a certain amount. You aren’t really learning anything about utility by constructing such cases, you’re implanting a certain dynamic in the market and unjustifiably reading it out as a statement on utility.
I do have another argument for you though (to wield against decision-theoretic consequentialism):
The main thing I like about this is how it very concretely demonstrates one possible challenge to the Independence Axiom; an axiom that is somewhat abstract/hard to grok.
(This is an iteration on a thought experiment originally discovered by me/Cousin_it).
Thanks for the kind words!
On EE: I think there’s a misunderstanding about what the dynamics are doing in the framework. EE doesn’t arbitrarily construct dynamics and then read out a “utility function” from them. The dynamics are empirical facts about the stochastic process. Wealth under repeated proportional gambles compounds multiplicatively; this is arithmetic, not a modeling choice. EE then asks: given these actual dynamics, what should you optimize? And the answer (the time-average growth rate, operationalized through the ergodic mapping) follows from the mathematics.
You’re right that weird dynamics (money confiscated above a threshold) would yield weird prescriptions, but that’s exactly the point: if you face such dynamics (think of margin calls, progressive taxation, or bankruptcy thresholds), your optimal strategy should be different from what it would be under simple multiplicative compounding. Different dynamics warrant different strategies. The claim that the correct evaluation function depends on the actual dynamics the agent faces is the central thesis of EE, and the fact that different dynamics yield different functions is the feature, not a bug.
Also, the time-average growth rate is not a “midpoint” evaluation. It’s a property of the entire trajectory taken as a whole over its full duration. If anything, EE is more focused on the complete trajectory than standard EU, which decomposes the evaluation into independent branches.
On the Refined Counterfactual Prisoner’s Dilemma: I think it most directly challenges causal decision theory (since the mechanism involves Omega’s prediction creating correlation between branches), which puts it in Newcomb-problem territory more than pure independence-violation territory.
You might also be interested in philosopher Lara Buchak’s book Risk and Rationality.
She makes a thought-provoking analogy between making decisions that result in a distribution over future selves and population ethics—in population ethics you’re not required to value everyone linearly, it’s okay to reject utility monsters and say “actually I just prefer universes where people are more equal.” Decision-making without independence is like population ethics over the distribution over future selves.
Thanks, will take a look!
True or false? “In a free market containing lots of different smart AI agents making decisions in lots of different ways, we should expect the AIs that follow the independence axiom to systematically gain control of more resources over time than the AIs that don’t.”
@Steven Byrnes I suspect that free markets are an environment which also does things like incentivizing treachery or races to the bottom. What I would expect is UDT-following agents being more likely to coordinate with each other, since the author didn’t actually propose an alternative decision theory. I suspect that the UDT should’ve been derived from a version of superrationality, but I don’t understand how one embeds big neural nets or agents involving randomness into predictors.
I’m not sure I buy this post’s assertion that UDT violates independence. It seems more like it violates “common sense independence”, in the same way it violates “common sense choosing the best option” when it one-boxes on Newcomb’s problem.
An agent locally acting according to a good policy might violate what a CDT agent would call independence, but it still obeys independence when choosing a policy, i.e. it has a numerical utility function, just not over the same stuff as the CDT agent.
One-boxing does not violate “common sense best option”. More people would one box than would two box (although it’s pretty close to 50⁄50). https://www.youtube.com/watch?v=Ol18JoeXlVI in the steelman for one boxing the math is favored by an expected utility approach anyways, as long as you think the genie has >50% probability of predicting you correctly. Plus, if two boxing is common sense, why ain’tcha rich?
Well, if you formalize “gain control of more resources over time” as taking the EV of resources controlled, the agents that also make decisions based on EV of resources controlled will do well. But if you formalized it in a different way, the agents that make decisions in that different way will do well :D
Yeah, this might be getting at a similar sort of difficulty as we have when looking for a universal/natural/objective criterion to compare decision theories. (I remember a post on this topic by Caspar Oesterheld, but can’t find it now.)
My operationalization would be something along the lines of: For all the worlds with “a free market containing lots of different smart AI agents making decisions in lots of different ways”, what is the total probability measure of those in which the most powerful agents are the ones satisfying the independence axiom?
Or: across all such worlds, what is the total measure of independence-obeying agents relative to the total measure of “most powerful agents”?
(Well, I guess this is just binning/discretizing power, so as to avoid blowing up its expected value.)
“The lack of performance metrics for CDT versus EDT, etc.” (An important post! FWIW I discuss some related implications for forecasting AIs’ decision theories here.)
I think you do a good job of arguing (in the earlier part of the article) that it is logically possible to drop the independence axiom without being money-pumped by giving up logical consequentialism but keeping dynamic consistency. However, I think you do a poor job of arguing (in the later parts) that we should give up consequentialism.
You examine 3 in-depth examples to try to show that we’d be fine if we dropped independence: ergodicity economics, the Allais Paradox, and the Ellsburg Paradox. In all 3 cases, it think your argument is missing a critical step that is required for its validity.
1.
In the section on ergodicity economics, you claim ergodicity follows resolute choice because it forms a plan based on the entire decision tree and then sticks to that plan. But this isn’t sufficient to carry your point, because agents that obey the independence axiom can also be described as sticking to their original plan. (In fact, any agent with dynamic consistency can be described this way, and you agreed we need dynamic consistency.)
What you’d need to show in order to carry your point is that ergodicity violates consequentialism. For example, you could show this by constructing an example where a local re-evaluation would deviate from the original plan, but ergodicity follows the original plan anyway. Without showing that, this example fails to support your case.
2.
In the section on the Allais Paradox, you give the following reasoning for why the common human answer is rational:
But this reasoning seems to be exactly backwards from the actual result: When component C provides a safety net of $1M, humans choose the lower-risk option A, but when component C provides nothing, humans choose the higher-risk option B. Your argument in this paragraph undermines, rather than supports, the rationality of the choice you are defending.
And aside from this one backwards paragraph, you don’t seem to offer any basis at all for how the context ought to change the answer. You have several paragraphs of philosophical hand-waving about how it is good and appropriate that it should, but don’t appear to offer anything like an algorithm saying how we should take it into account. Without a model predicting the preference for A over B, you fail to win any Bayes points.
Nothing in this section sounds like a logical reason to consider the common human choice in the Allais Paradox more rational than I previously did.
Sidenote: Empirical Money Pumps?
This discussion also suggests the question: Can you actually, in real life, use the Allais Paradox to money-pump humans? If you can, then the behavior of humans does not provide evidence of the rationality of their choices in this scenario, regardless of any theoretical arguments about how we could avoid money pumps while keeping this preference. My brief Google search failed to immediately turn up any experiments involving actual money pumps, but I haven’t done a careful literature review.
Sidenote: Can the Allais Paradox result be justified in other ways?
There’s two defenses for this that I somewhat credit:
A.
Eliminating a possible outcome makes it cognitively cheaper to plan for what happens after the lottery, because you don’t need to consider as many distinct cases.
Notice this reasoning only applies if there is an “after”, which is usually true in real life but usually false in abstract formal examples.
B.
Suppose you are living among a population of similar agents that compete for resources, and all of the other agents get to make a similar choice between lotteries. Then, the outcome where you get nothing is always the same in terms of absolute resources, but not in terms of relative resources when comparing to other people.
If you choose between a 1% chance and a 0% chance of getting nothing, then the few agents who end up with nothing will be out-competed by almost everyone around them. They will lose approximately all competitions and will be the obvious choice for predators to target.
If you choose between a 90% chance and a 89% chance of getting nothing, then agents who win millions will still out-compete the ones who get nothing, but they’ll have a harder time monopolizing all opportunities because there won’t be as many winners. Many of the “losers” will still have a decent relative standing.
This reasoning doesn’t apply if you somehow know this lottery is a special one-time opportunity for you only, but it seems plausible that our instincts evolved mostly to deal with non-unique opportunities.
However, notice that these two reasons justify different things. Reason A justified zero-risk bias, i.e. paying a premium to reduce a risk to ~zero; it has a sharp change in your preferences at a specific probability. Contrariwise, reason B would remain nearly as strong if we changed “1% or 0%” to “2% or 1%”.
3.
In the section on the Ellsberg Paradox, I think you make some clever points about why the standard human answer might be rational, but I don’t see how any part of this section ties into logical consequentialism or the violation thereof. For example, you have not explained how a money pump could be constructed based on this scenario.
4.
The argument in favor of logical consequentialism is obvious: If you violate it, you are leaving money on the table. (Violating it implies that you are making a choice that satisfies your own preferences less than another choice you could have made in the current circumstances.)
In fact, this is essentially the same reason that we think that vulnerability to money pumps is bad (you end up with less money than you predictably could have). So it seems pretty weird to argue that we need to keep all the axioms that prevent money pumps but it’s somehow ok to drop consequentialism. I’m not sure what set of assumptions would validly lead to that combination of conclusions.
I love the post and the comparison between Euclid’s fifth postulate. Obvious in hindsight, hadn’t occurred to me.
I’d love to see some sort of systematic classification/taxonomy of independence-violating decision frameworks.
I am thinking about it, hopefully there will be a post on that.
A friend and I spent a day or two poring over… one of Ole Peters’s papers, I believe this one… a couple years ago, and came away feeling like we still didn’t “get it.” Reading this post, I, uh, well, I still don’t get it, but thank you for helping my confusion at least crystallize into an I-think-well-defined question:
What valid (coolness, recommendation) pair is most likely to make me say
?
Your
Situationtype is single-shot (and that is similar to some other comments): one action and one probability distribution. EE’s claim is specifically about sequential, multiplicative settings, which your type signature doesn’t yet express.If you extend the formalization to a repeated game with multiplicative compounding, the answer becomes concrete:
And then
coolness(recommendation) > coolness(ev_maximization)is a mathematical theorem. The expected-value maximizer bets everything every round and almost surely goes broke. The Kelly bettor bets a fraction and almost surely achieves the maximum long-run growth rate. After enough rounds, the Kelly bettor is richer than the EV maximizer with probability approaching 1.The single metric by which EV maximization “wins” is mean wealth across a hypothetical ensemble of parallel agents, which is dominated by vanishingly unlikely astronomical outcomes that no individual agent will almost ever experience.
This is EE’s core claim: expected value is the wrong
coolnessfunction for a single agent in a multiplicative sequential environment, and the rightcoolnessfunction is the time-average growth rate, which the ergodic mapping formalizes and generalizes beyond the multiplicative case.(Thanks for your response! I’m pessimistic that this conversational subtree will lead to great insights for either of us, but am jotting down my scattered thoughts in case they’re of interest.)
It seems to me that you can represent a sequential setting as a one-shot
SituationwhoseActiontype is a function from “observations so far” (in your +50%/-40% example,List<"win"|"lose">) to “action I take in that sub-situation” (in your example, the fraction of my bankroll I bet on the next flip.(...maybe you can’t do this transform if you violate dynamic consistency? But violating dynamic consistency seems Actually Crazy in a way that merely violating consequentialism isn’t. Something is deeply broken if you simultaneously think “I’m going to do X if Y happens” and “I know that after Y happens I won’t do X.”)
In your “fair coin, +50%/-40%” example: if I have the opportunity to play this game for 10 rounds, and my life savings are $100k, then I agree “always bet everything” seems like a bad plan, and optimizing for my median net worth at the end seems pretty reasonable.
...but if $100k is just the contents of my wallet, and I have $X in illiquid assets that can’t participate in this game… and $X is much larger than ($100k)x(1.5^10)… then optimizing for my median net worth at the end no longer seems reasonable.
My best guess at your resolution to this is something like “in the second case, 10 rounds isn’t necessarily sequential enough, you need a number of iterations that depends on X”; but me having an extra $100T at home doesn’t affect my preference ordering of outcomes, and it seems to me that a median-utility-based decision theory should be insensitive to the magnitude of [my illiquid savings] vs [my bankroll].
No response required!
Great post! I think it would have been even better if you gave a short of explanation of what the independence axiom is somewhere at the beginning. I only felt like I have some understanding of it when I reached this part (which is almost halfway through the post):
Thanks, I will add a note about it!
This seems false!
Let’s say that at the start of the tree, there’s a node that is not accessible to the sophisticated chooser because they’re not able to constrain themselves: e.g., they know for a fact they won’t pay in Parfit’s hitchhiker, and so they die in the desert; even though they would really want to pay later, they don’t have a way to.
Let’s say the payment is $100 and they value their life at $1m.
If you offer them a deal: they pay you $999 999 now so that when they get to the city and pay the $100, you give them $100.01 if they pay, they’ll happily agree: they gain $1.01 of value this way by not dying in the desert (even though if they were some other way, they could’ve had $999k+ more) because now by paying $100 when in the city, they gain $0.01.
Now, though, after this agreement, already with so much of their money, you can add that you will actually give that agent, for free, $0.02 if they do not pay when in the city.
The agent is very sad: they’ll now die in the desert, and even paying you $999 999 didn’t save them from that fate.
Oh well, happily, you have a solution: if the agent paus you $999 999, you will pay the agent $0.02 if they do pay when in the city.
(Repeat offering to pay $0.02 as an incentive to make the agent at the end prefer paying to not paying by $0.01 for $999 999, then say you‘ll also pay the agent $0.02 more if they do not pay, forever, until the agent is out of all of their utility.)
This is all downstream of the sophisticated chooser assigning extremely low probability to being offered that $0.02, despite the fact that it keeps happening.
You could “money pump” any agent if you were allowed to assume that they keep being wrong. A EUM standing by the sidelines here (with the same beliefs) would be incentivized to keep betting “this guy won’t be offered another $0.02″, and they’d also get “money-pumped” by constantly losing their bets.
Yeah, you’re right, thanks. I was only half-awake when I wrote the comment.
The agent will pay $999k, but only a very limited number of times (e.g., once), so it’s not that the good kind of money pump.
Yes, this was already discussed in the comments, and I added a footnote on that, please see footnote 3.
Well, no. I demonstrate not just leaving money on the table, as discussed by other commentators, but this kind of money-pump that extracts infinite utility out of sophisticated choice.
You’re right, I mislooked. This is a stronger result.
I’ll update the footnote.
Curated! I don’t feel fully competent to evaluate this post, but gain confidence in its curation-worthiness from Habryka having endorsed it. Yet, I’ll describe the various many things I like about it, in no strong order. It is earnestly scholarship, engaging both on an interesting and important topic, and situating its reasoning amongst the work of others. Ihor didn’t just read a little or muse on the topic, but has studied the field. The topic is fundamental, and it’s challenging the fundamentals. I value the boldness of that. Most posts that are making intellectual contributions are pushing at the edges, the frontiers, and it’s cool (assuming quality is high, and I think that’s clearly the case here even if it would turn out to be wrong) to have challenges made at core doctrine – especially as the case does feel compelling here. The writing was pleasant to read notwithstanding a non-zero LLM score (we’re wrestling with LLM-assisted writing on LW, but felt quite good to read). The post doesn’t fully explain all it concepts for an unfamiliar audience, but does do some of this pedagogy in a nice way, e.g. explaining the different types of utility in a technical sense. I model that if we had more discourse of this kind, back and forth, we’d make some pretty neat intellectual progress. I could imagine someone coming along and making some really strong counters, but I’d just love to see that back and forth. I wasn’t familiar with Ihor before, but I hope he keeps writing. Kudos.
Benja is a she.
Corrected!
“buf features” ← typo I think (in “Just give up”, point 2).
I think there is a mistake here.
The formal definition of the Independence axiom is Independence: if X>Y then pX+(1-p)Z>pY+(1-p)Z for all Z and all 0<p<1.
But the context for ergodicity theory is with an infinite sequence of bets. And your decisions today depending on some other bets that you might be offered in the future, that’s standard
Imagine your utility function is linear, except with an extra jump upwards at the $1000 mark.
You have a gamble that has a 50% chance of winning $999, and 50% chance of loosing 1000.
If this is your only gamble, it’s not worth it. If tomorrow you can bet $10 on a fair coin, then todays gamble is worth it.
The setup here doesn’t depend on a “context” of past bets. Only on the future bets. Which is totally ok and normal.
In your example, the utility function is fixed (linear with a jump) and the optimal strategy adapts to future opportunities via standard backward induction. In EE, the evaluation function itself is derived from the dynamics rather than assumed independently of them. Multiplicative dynamics yield log, additive dynamics yield linear, other dynamics yield whatever the ergodic mapping produces. This isn’t “fixed utility function, different strategy in different trees” but “different dynamics, different evaluation function.”
This matters because standard sequential decision theory requires you to bring a utility function to the table before you can do any backward induction. Where does it come from? In your example, you simply stipulate it (linear with a jump). In standard economic practice, it’s a free parameter fitted to behavior. EE’s claim is that you don’t need to assume or fit a utility function: the dynamics of the process determine it.
Previous comment was too complicated.
Imagine an agent with a simple indexical utility function.
If the agents wealth is increasing like then the utility is (1,a,b)
If the agents wealth is increasing like then the utility is (0,a,b)
(Note that this agent doesn’t care what happens at any finite time. They care about the infinite asymptotic limit of their wealth)
This agent gives priority to the leftmost term, only optimizing the terms to the right if it faces a tie on the left.
I think this agent behaves like the ergodic economics agent on these 2 test environments.
Ps. I don’t understand what ergodic economics would do in more complicated environments. Lets say it doesn’t yet know whether it’s future bets will be linear or exponential. So it doesn’t know if it should use wealth or log wealth to calculate the bet in front of it. What then?
Standard expected utility theory deals most easily with the case where the number of rounds is finite.
So. How do we deal with an infinite number of rounds?
First, to avoid the difficulties of infinite backwards induction, assume you take a single choice. You choose one policy out of the set of all policies.
A potential result of a policy is a wealth trajectory, a function from the naturals to the reals, representing your wealth after each timestep.
Pick a nonprinciple ultrafilter U on the naturals.
We look at the equivilance classes formed by A~B iff is U-large.
This gives us a non-standard model of the reals.
Finite sums and products are easy to define on this nonstandard model, but the naive infinite sum depends on the choice of representitive of the equivilance class.
The expectation calculation involves an infinite sum.
I think that, given an infinite sequence of events (random bools) then it makes sense to define
Ultrafilters aren’t in the usual sigma algebra by default, but I think this definition makes sense.
Now our nonstandard model of the reals is something it makes sense to define a supremum in.
So, given our wealth sequence we can define the median sequence as
Note that W and A are sequences, so the median is a sequence (up to equivilance class) . This comparison is actually an infinite sequence of comparisons, and we are using the probability that an infinite sequence of bools is in an ultrafilter construction (above)
Now we just define the expectation as where f(x) is the median (except with x=0.5) Eg f(x) represents the sequence that you have a chance x of exceeding.
I think that this setup yields your ergodic dynamics as just expected utility maximization. (Admittedly on a nonstandard model of the reals)
This is the part I’m most confused about.
As someone who hasn’t delved deep in the theory paper but read VNM and other related lesswrong pieces, I would have thought that continuity was way more controversial. It’s fairly easy to conceptualize your utility-function as being local, and so shouldn’t be influenced by external factors.
I find it way harder to assume my valuation should be continuous and that I ought to be able to non-constructively exhibit a probability for everything (What is the p such that you don’t care whether I p(break your arm) + (1-p)(you gain 1 million dolar)?).
I found that this is often times very contrary to human intuitions, way more than the independence axiom. For instance, against the dust-speck problem, I’ve heard a common refrain of needing to reject archimedean-ness, and maybe this extends further.
On another note, I don’t think the Allais argument is locally valid, since a lot of different human behaviors are not rational-on-reflexion, but I would expect that for all other axioms, it’d be possible to construct experiments that violate them as well.
I wonder if there are self-consistent and useful theories if you reject continuity instead of independence.
You’re not alone. I heard Abram say that continuity is probably/perhaps the most controversial vNM axiom in a talk at ILIAD 2024 (not recorded, I’m afraid). More recently (as of Aug 2025, private conversation), he was up for rejecting both continuity and independence, and going with something more Infra-Bayesian-ish as a general basis for decision theory / utility theory.
A quick and maybe pedantic (sorry!) readability comment: “Lobachevsky” appears here without introduction and without any further mention. My guess: many readers here won’t recognize the name. For me, it felt like “wait, did I miss something?” rather than “ah yes, the parallel discovery.” Suggestion: how about “the Russian mathematician Nikolai Lobachevsky” instead?
This really helped me understand the problem with independence. So imagine a consequentialist agent in a Parfit’s hitchhiker situation. Given the option to become non consequentialist (and thus able to carry through on a commitment to pay for a ride) it would do so, since that is the consequentialist thing to do (looking forward it’s the best option). So consequentialism is not actually a stable state of being.
Here’s the strongest argument besides the money pump that I have for the independence axiom: independence is equivalent to “denesting” nested lotteries. For example, a 50% chance of (50% of an apple, 50% of a banana) is seen as the same as (25% of an apple, 25% of a banana). But to me, this surely has to hold—to do otherwise is as if to treat the “gambling” operation as special, instead of viewing each possible ultimate consequence and associated probability. So if you consider just the ultimate outcomes, you’ll satisfy independence.
But then, in what sense do you violate independence? Suppose you choose 2B over 2A in the Allais paradox, and then after the coin is flipped you still choose 1B. In what sense do you “prefer” 1A to 1B, then? What we care about is revealed preference.
I don’t think this is correct. Commitments that you follow through on route through some “logical dependence” that makes the worlds you consider yourself as choosing between different. Consider the resolute chooser in the Allais paradox. To them, the globally optimal choice is after the coin is flipped is to choose 1A, as if its they prefer it and if its tails they get nothing no matter what. So why isn’t the calculated globally optimal plan not still to pick 2B and then switch to 1A after the coin is flipped
Huh? Don’t the dynamics directly effect the outcomes each gamble provides? I don’t see where the context-dependence occurs—you should elaborate. But I’d also say—clearly their utility function is over long-run growth, not over the individual outcomes of the gambles. When they view their gambles as actually being lotteries over the limiting growth rate, I think they should still satisfy independence.
Overall, I liked the post, I hope to see more discussion of ‘coherence theorems’.
In the sense that you would choose 1A over 1B if those were the only options in a one-shot decision. The money-pump arguments might constrain your patterns of choices over some sequences. But this tells you nothing about what you should choose in other contexts.
I’d also recommend Carlsmith’s discussion of similar problems with money-pump arguments here. (I have other problems with such arguments, but they’re tricky to unpack here.)
I haven’t thought more about it yet; but I’ll say already that yes, I did miss that obvious point here, sorry, and thanks for pointing it out.
I’m sure there is something very important here, but I’m not sure that the independence axiom is the right target.
Take the example that begins at the sentence “Suppose your preference between gambles A and B depends on what the common component C is”. When I observe C, I have new information. There is nothing strange about making my decision between A or B conditional upon new information. The axiom is about irrelevant alternatives, as in the classical example of switching from apple pie to blueberry pie on discovering that cherry pie is also on the menu. But C is described as a “common component”, which does not sound like something irrelevant, but more like discovering that the fruit pie comes with a scoop of ice cream. A fully concrete example would be useful. As it is, there is nothing strange about the plan “if C, do B, otherwise do A”, and no way for this to be money-pumped when I and the would-be money-pumper are both ignorant about the outcome of C.
Similarly in the paragraphs on “Resolute choice” and “Sophisticated choice”. I am not seeing what examples of these might be. Consulting the linked paper on the latter does not enlighten me. From a brief glance, its section III builds in an assumption that the decision maker must fix on a plan that cannot be conditional on information that will be obtained in the future. Clausewitz famously had something to say about that.
The examples that I think tell more against EUT are about long-run gains in non-ergodic systems. One could defend EUT even there by stipulating that the long-term growth rate is the utility of a proposed gambling scheme. However, such manoeuvres would amount to redefining EUT to make it something that still applies in order to preverse the name “EUT”, which is perhaps as futile as redefining phlogiston to mean negative oxygen.
ETA: This from Samuelson’s “Probability, Utility, and the Independence Axiom” is illuminating. The paper on the whole is defending the independence axiom, but near the end he writes:
...
I think there’s a specific clarification that might resolve the main concern.
The independence axiom is not about conditioning on observed information. If you observe C and then choose between A and B, conditioning your choice on what you observed is perfectly rational under EUT as well and nobody disputes this. The independence axiom constrains something different: your ex ante preference between two compound lotteries, before any uncertainty resolves.
Concretely: suppose you must choose right now between two lotteries, both of which contain the same common component C mixed in at the same probability:
Lottery 1: 90% chance of C, 10% chance of A Lottery 2: 90% chance of C, 10% chance of B
Independence says your preference between these two lotteries must be the same regardless of what C is. You don’t observe C before choosing. You choose between two complete packages.
The holistic objection is: the identity of C changes the overall risk profile of the package. Suppose A is risky (50/50 chance of doubling or halving your wealth) and B is safe ($5K certain gain). If C = “$1M for certain,” then both lotteries give you a 90% safety net, and you can afford to take the risky A in the remaining 10% branch. But if C = “$0,” then both lotteries give you a 90% chance of nothing, and in the remaining 10% branch, the safe B becomes much more valuable. Your ranking of the two packages flips, not because you observed C and updated, but because the overall distribution of the package changes with C even though C is common to both.
The conditional plan “if C, do B, otherwise do A” that you describe is a strategy for a sequential decision problem, which is indeed unproblematic. But the independence axiom applies to the static choice between compound lotteries, where C is not observed but mixed in. Your restaurant analogy of apple/blueberry/cherry pie is, as I see it, actually closer to “independence of irrelevant alternatives” (the Arrow axiom), which is a different axiom in a different framework. The vNM independence axiom is more like choosing between two fixed price menus that share the same dessert but differ in the main course: the axiom says which menu you prefer shouldn’t depend on the shared dessert, but a holistic diner evaluating the total meal might reasonably disagree.
Upon thinking about it, I agree that the resolute/sophisticated choice sections would benefit from more concrete examples, and I really agree with the point in your final paragraph.
I’m having trouble seeing how this works. Regardless of whether C is in the pool, I run a 5% risk of halving my wealth by taking the first gamble. I think the safety net metaphor only makes sense if the outcome can’t be worse than C, but in this example it seems like there’s a hole in the net I can fall through.
Fascinatingly enough, ordinal vs. cardinal utility seems to be a major part of Tyler’s new book: https://tylercowen.com/marginal-revolution-generative-boo
Considering confusion in some comments, I think it is good to clarify the distinction between something like 2 dimensions of decision theory, so that you see better where my argument fits in exactly.
Dimension 1: How do you evaluate a probability distribution over outcomes?
This is where EUT, ergodicity economics, rank-dependent utility, prospect theory, and the independence axiom debate all live. The question is: given that action A leads to probability distribution D_A over outcomes, how do you assign a scalar value to D_A so you can compare it with D_B?
Dimension 2: How do you determine which probability distribution each action leads to?
This is where CDT, EDT, FDT, UDT, and the Newcomb’s problem debates live. The question is: given that I’m considering action A, how do I figure out what distribution over outcomes A produces?
These dimensions are orthogonal:
One can, in principle, combine any answer from dimension 1 with any answer from dimension 2. One could do CDT plus EU maximization (the standard economics setup). Or CDT plus EE (use causal reasoning to determine what each strategy leads to, then evaluate strategies by time-average growth rate). One could do FDT plus EU maximization (use functional reasoning to handle Newcomb-like problems, but evaluate the resulting distributions by taking expectations).
My post is fundamentally about dimension 1.
It is about the evaluation functional (should it be an expectation? is it forced by the independence axiom?) rather than about the causal or strategic structure of decision problems (CDT vs EDT vs FDT vs UDT).
OK, I see you’ve confused Independence of Irrelevant Alternatives with VNM-Independence here. I don’t want to be too harsh here, because this is probably my fault, but this basically completely undermines your core argument since you haven’t actually refuted the Dutch Book for VNM independence. In short, c. 2022:
I made an edit describing VNM-independence as a probabilistic version of IIA in the Wikipedia article on the VNM theorem. By this I meant they share a common structure in that both are about ignoring details that obviously shouldn’t affect the outcome, but which several people misunderstood as saying VNM-independence encompasses IIA (it doesn’t).
In this StackExchange comment, I answered the question in the OP’s title without explaining they were confusing the two axioms in the body.
I’ve since fixed both of these, but for a while these were two of the first things that showed up when you Googled the topic, so chances are you read them.
There’s several other major misunderstandings in this post that trace back to Wikipedia articles that I, unfortunately, haven’t had time to correct. In general: Wikipedia articles and online resources on rational choice or decision theory are not reliable. I do not recommend reading them. They consistently make several basic errors along that show up in this post, like conflating descriptive and prescriptive validity, confusing expected value with expected utility, and citing confused philosophers as if they represent substantial viewpoints (when they contain mathematical errors that have been repeatedly pointed out by experts).
I recommend plugging this into Gemini and asking it to explain all the mistakes in this post, which it should be able to do; it did a good job explaining the difference between IIA and VNM-independence when I tried it out.
Good post! I generally agree with the general thrust to be more flexible in what counts as a rational behaviour, but about one specific example of Allais paradox, do we even need to abandon EUT to explain it? In practice, the bet involves money, and if we do have a utility function, it isn’t “money” at all; it’s whatever psychological benefits (safety, comfort, pleasure) money can afford us.
Now on one hand we have five millions compared to one million, and we know the growth of the utility of money is somewhat sublinear—the first million is a lot more valuable to us than the second, as we have a hard floor of physical needs. On the other we have the fact that, depending on who we are, getting nothing might cause us significant distress—regret at a missed opportunity, guilt towards our dependents. So that zero really is a quite large negative number, potentially. The same regret isn’t there if our choice didn’t really give us a certainty option. So in that framework, the use of money is what’s misleading—it doesn’t rule out a real underlying utility function, it’s just not affinely equivalent to it.
Please see note 5.
Notably, this implies that the sunk cost fallacy might not be a fallacy at all, at least in the general case.
I just want to flag that there is another family of non-independent decision rules called multiplier preferences or more generally variational preferences which result in a certain kind of adversarial robustness to Knightian uncertainty. Multiplier preferences in particular have a very natural interpretation in terms of information geometry and can be seen as roughly the dual of active inference. Where active inference says “my utility function can be seen as a prior”, or “I can deviate from my policy during planning by paying a KL cost”, multiplier preferences say “Nature can adversarially deviate from my model of it by paying a KL cost”.
Nice post! Another place where nonlinear utilities have been considered is in technical game theory / regret minimization literature. A consideration for introducing them is computational complexity—while VNM utility applies, it may be inefficient to linearize the utilities in terms of representation and downstream algorithms. An example is here, where a log-utility game is considered due to complexity/efficiency considerations.
I think you missed this prior discussion of non-independent decision making.
You might also be interested in my post relating ergodicity and utility. This proposal actually is independent, which works because it has an infinite set of possible outcomes (which also violates vNM).
Im generally open to nonindependence, but I don’t like resolute choice, because it has a special “time of strategy fixing” as an additional fact. This doesn’t play so well once we leave the world of toy model worlds that begin with their game. Sophisticated choice is even worse, because it assumes a definite end. The specific nonindependent things people want to do often don’t have this problem, so I think it’s worth having some kind of theoretical intermediate here.
I don’t see how this is a contrast. Is this not what an ergodic strategy is doing, also? A resolute choice? All updateless decision theories are stepping outside time; there’s a reason the first step in that direction was called ‘timeless’ decision theory. Benja’s pretty explicit that these plans include probabilities; it is selecting choices that make sense at particular intermediate steps. If the correct strategy to make bets to achieve long-run growth, that optimizes the bankroll at the time labeled by the genie as ‘outcomes’ is to Kelly-bet, and that’s what you care about, the genie will choose to Kelly-bet.
However, it will not do that, because long-run growth is not what we care about. Optimizing long-run growth is correct, AIUI, only if losing your last dollar is infinitely bad and there is no bankroll size at which you cease to desire it to grow at the same strength you presently desire it. Certainly I prefer a quadrillion dollars to a trillion, but if I have a trillion (in 2025!USD-equivalent), I will cease to be particularly interested in what choice optimizes the long-run growth of that bankroll, and will instead be much more conservative.
I am far from certain. But I believe this thought experiment deviates from what you prescribe only in circumstances where you become the one relying on unnecessary and inappropriate assumptions. It looks to me like the argument there continues to hold for exactly the processes you suggest are superior.
Take any set of points,
a, b, c, …Take any convex function (=function that is ‘rising’ at every point, such that any line between two points on the function’s graph is above the function, =function which is concave up, = function whose second derivative (assuming it has one) is never negative) which can be applied to those points.
Then the mean of the functions
mean(f(a), f(b), f(c), …)is less than the function of the meanf(mean(a, b, c, …)).‘Ergodic’ kept looking like noise to me, and this seemed to be intended as an explanation but, to me, clarified nothing. (Feldspars problem, presumably.) I’ll attempt to rephrase, reflecting my present understanding after a few minutes of research:
Most processes that vary do so differently over time than at any present moment. The ensemble average is looking at a bunch of similar processes and how they’re varying right now, like forming the S&P 500 from 500 individual stocks, essentially buying the ensemble average of their growth (now and in the future). The time average of the growth, is the long run performance of one stock; the (past and/or future) time average of any individual stock is not the same as the ensemble. This makes prediction hard.
Something is ergodic if it does not have this problem. That could mean a steady state; a dollar bill’s value, denominated in USD, is mostly ergodic, though imperfectly because we might someday go full Weimar, abolish greenbacks, or make it an old print that’s a collector’s item. But you can construct things to be ergodic on purpose, as the examples describe; Kelly-betting is a strategy whose ‘many-worlds’ possible present instances have the same bankroll growth ensemble average that, in the long run, the time average predicts.
I am unclear if ‘the ergodic mapping’ is a specific coherent concept or a more flexible process that you apply to a phenomenon in imprecise ways to arrive at a result you can then prove is ergodic.
I haven’t had much exposure to this discussion so I might be missing something basic, but I am somewhat confused as to what would actually count as evidence here.
It seems that if someone shows behaviour like Allais or Ellsberg that we cand either say: “they’re violating independence” or “the outcomes weren’t specific richly enough.”
With this in mind, are there any possible patterns of choice(s) that would clearly count as a real violation, rather than just something that can be explained away by redefining outcomes?
Great post! I have a small question about the sophisticated chooser. I see that it was clarified that it can potentially be money-pumped. Does that mean that it doesn’t satisfy dynamical consistency?
What about consequentialism? Is that depending on if the local preferences depend or not on closed branches? (is that the definition of “locality”?)
Thanks!
The sophisticated chooser does satisfy dynamic consistency, by construction.
So the money-pump vulnerability isn’t from deviating from a plan; it’s from the plan itself being ex ante dominated. There exists a better plan, but executing it would require behaving at some future node in a way that isn’t locally optimal, which the sophisticated chooser won’t do.
And they also satisfy consequentialism: at each node, they choose based solely on what’s available from that node forward, ignoring closed branches.
So their node-level behavior satisfies independence. At each individual choice point, they act as an EU maximizer would. The problem is that their underlying preferences over compound lotteries may violate independence, and forcing those preferences through the consequentialism-plus-dynamic-consistency filter produces globally suboptimal plans.
Can someone clarify this passage to me? I find myself increasingly confused. Earlier, we assume agent can form a plan: “if the coin comes up heads (no C), I will choose A, if coin comes up tails, I will choose B (with C)”. How can I be money pumped? I don’t violate dynamic consistency nor do I violate consequentialism. Yet I violate independence, and can’t be money pumped. I can’t be convinced to pre-commit to either B or A, since there are no predictors involved, and I can just postpone my actual choice.
Edit: Actually, I don’t violate independence either, these are simply different outcomes. So I don’t understand this argument at all.
This seems untrue. “a rational agent computes the probability-weighted average of their subjective values across all possible outcomes.” isn’t the same as agent taking expected value of f1. Expected value of f1 doesn’t carry any meaning at all — it is ordinal, not cardinal. I could prefer two apples to one apple only slightly, but f1(two apples) would be extremely larger than f1(one apple), without violating any Debreu’s theorems.
What this actually says is that agents takes f2, and takes expected value of it across all possible outcomes. This is exactly what VNM agent does per the original theorem, and it is true, per my understanding, that agents “value gambles at the weighted sum of how much they value each possible result”.
This seems untrue because it is untrue, because in that paragraph I describe how it may sound to a casual listener, not what is correct.
My immediate first line of thought with this, which I admit I can’t guarantee is relevant, is that in any given real world scenario there are multiple competing models of what game you are even playing, and therefore:
Game theoretically one takes the action appropriate to the average of outcomes across games weighted by the probability of each game being the one you are playing
Probing actions can radically gain value as the number of plausible distinct games increases, or as their distinction increases or polarizes, or as the proportionality of each possibility to number of possibilities increases. In a game like poker, per Sklansky, probing has negligible value because you are gaining very limited information. But in most real world contexts probing gains value to the point that error develops innate value, albeit often not to the person committing it.
If you don’t know what game you are in and especially if you expect to never have certainty, optimizing play for a single game or any fixed bundle of games is actually very bad
As someone whose decision making processes are basically an attempt to progressively expand poker decision theory to broader usability, this post’s distinction between expected value and expectation of logarithmic growth rates obviously has some immense significance to how this chain of thought would develop for me if I were smarter.
Misc thought I had that seemed related and occurred simultaneously but that doesn’t have a place when I assemble the other thoughts into a structure: people who make initial errors that put them on bad branches of a decision making tree often become good at finding and consistently choosing the local optimum of that branch. I guess the ex post facto attempt to make this relevant to the above is “even self handicapping through premature optimization has it’s uses”
I will say that in money poker games my capacity for play died when I moved away from Sklansky and towards Janda, and this seems to have occurred at around the same time everyone stopped reading poker books at all and started using solvers. So the time embedded agent thing is not only entirely relevant even to what are basically toy cases in real life, albeit with stakes, but the shift to relevance started happening sometime around the mid 2010s. eg, the environment itself already seems to be converging on ergodicity and time embedded understanding of decision making in high level play independently of decision making theory developments.
I feel confused by this example.
If I understand correctly, picking a more specific example.
t0: someone decides to toss a biased coin making it more likely it lands tails. If it lands heads you put 1M$ in box H. If it lands tails they put it in box T.
Adversary offers me (i) i’ll take H, free of charge; or (b) I’ll take T but I have to pay you 0.01$.
Then after the coin toss you offer me make a switch for 0.01$.
I this doesn’t sound like a money pump.
If I knew offer 2 is coming then I’m simply silly if I take the cheaper option, I’m plain throwing money away for no reason.
If I didn’t know then it seems I’m paying for options / lack of information not form making bad decissions.
Ah. I misunderstood. I missed that the uncertainty is on a different thing
Thank you for this overview! Unfortunately, I struggle to understand how your proposal differs from a variant of Updateless or Functional decision theory representing decisions as a preset function of observations, chosen so as to yield the best outcome. The trajectories on the possibilities tree still have to be sorted (e.g. by assigning some utility function to each trajectory or to precommitments that the agent makes and keeps or fails along with the outcomes of the branch).
Thanks for your comment!
My article is primarily a critique of the independence axiom, not a proposal for a specific alternative decision theory. So I can’t really answer “how does this differ from UDT/FDT” because there is no “this” in this sense, but, as far as I can understand your question, the connection you’re noticing is real and is actually part of the argument.
I argue explicitly (in section on Garrabrant’s comment) that updatelessness and the ergodicity economics critique converge on the same structural insight: both reject branch-by-branch post-update evaluation in favor of holistic policy-level or trajectory-level optimization. So the parallel to UDT/FDT is a feature, not a bug.
On your second point: yes, you still need to rank trajectories or policies. Dropping independence doesn’t mean dropping preferences. What it means is that this ranking need not decompose into the expected value of any function. You can have a complete, transitive preference ordering over trajectories that depends on global properties of the trajectory (like for example time-average growth rate) in ways that no branch-by-branch expected utility can capture. The ranking exists and is well-defined; it’s just not an EU ranking.
I emphasize here ergodicity economics because it provides an example of a principled, non-arbitrary way to construct this trajectory-level objective: derive it from the dynamics of the stochastic process via the ergodic mapping, rather than postulating a utility function and taking its expectation. UDT tells you to optimize over policies but doesn’t specify what to optimize. EE gives you a specific answer grounded in the mathematical structure of the process you’re embedded in.
As a lifelong “money pump” this has definitely given me something to think about.
In footnote 2, you write “So McClennen’s book is actually the key source that sets up the three-way taxonomy: naive, sophisticated, and resolute.” But nowhere else in this post do you refer to anything as “naive”, so this is a confusing choice of language.
Honestly that sentence sounds to me like something said by an LLM conducting a literature review which then wasn’t edited to make sure it still flows well in the final version of the post.