# Meetup Notes: Ole Peters on ergodicity

Ole Peters claims that the standard expected utility toolbox for evaluating wagers is a flawed basis for rational decisionmaking. In particular, it commonly fails to take into account that an investor/bettor taking a series of repeated bets is not an ergodic process.

Optimization Process, internety, myself, and a couple others spent about 5 hours across a couple of Seattle meetups investigating what Peters was saying.

# Background

## Why do we care?

Proximally, because Nassim Taleb is bananas about ergodicity.

More interestingly, expected utility maximization is widely accepted as the basis for rational decisionmaking. Finding flaws (or at least pathologies) in this foundation is therefore quite high leverage.

A specific example: many people’s retirement investment strategies might be said to be taking the “ensemble average” as their optimization target—i.e. their portfolios are built on the assumption that, every year, an individual investor should make the choice that, when averaged across (e.g.) 100,000 investors making that choice for that year, will maximize the mean wealth (or mean utility) of investors in the group at the end of that year. It’s claimed that this means that individual retirement plans can’t work because many individuals will, in actuality, eventually be impoverished by market swings, and that social insurance schemes (e.g. Social Security) where the current rich are transferring wealth to the current poor avoid this pitfall.

Claims about shortcomings in expected utility maximization are also interesting because I’ve felt vaguely confused for a long time about why expected value/utility is the right way to evaluate decisions; it seems like I might be more strongly interested in something like “the 99th percentile outcome for the overall utility generated over my lifetime”. Any work that promises to pick at the corners of EU maximization is worth looking at.

## What does existing non-Peters theory say?

The Von Neumann-Morgenstern theorem says, loosely, that all rational actors are maximizing *some* utility function in expectation. It’s almost certainly not the case that Ole Peters has produced a counterexample, but (again) identifying apparently pathological behavior implied by the VNM math would be quite useful.

Economics research as a whole tends to take it as given that individual actors are trying to maximize, in expectation, the logarithm of their wealth (or some similar risk-averse function mapping wealth to utility).

# Specific claims made by Peters et al.

We were pretty confused about this and spent a bunch of investigation time simply nailing down what was being claimed!

Expected utility maximization has some major pathologies. (didn’t have time to dig through this paper enough to identify the specific pathologies claimed)

# What we learned

## 1.5x/0.6x coin flip bet

This is a specific example from https://medium.com/fresheconomicthinking/revisiting-the-mathematics-of-economic-expectations-66bc9ad8f605

Here’s what we concluded. [These tags explain the level of proof we used.]

It is indeed the case that playing many, many rounds of this bet compresses almost all the winnings into a tiny corner of probability space, with “lost a bunch of money” being the overwhelming majority of outcomes. [math proof]

Betting only a tiny, constant chunk of your bankroll every time instead of all your money at once does, as expected, make you richer most of the time. [Monte Carlo simulation, intuition]

Reasoning about what happens over a gazillion rounds of the game is a little bunk because you don’t have to commit to play a zillion rounds up front. [hand-waving math intuition]

i.e. if someone is choosing, every round, whether or not to keep playing the game, pointing out that (their decision in round N to keep playing is dumb because it would be a terrible idea to commit to play a gazillion ( >> N ) rounds up front) is a red herring.

## “Rich house, poor player” theorems

The “coin flip” example of the previous section is claimed to be interesting because most players go bankrupt, despite every wager offered being positive expected value to the player.

So then an interesting question arises: can some rich “house” exploit some less-rich “player” player by offering a positive-expected-value wager that the player will always choose to accept, but that leads with near certainty to the player’s bankruptcy when played indefinitely?

(As noted in the last section, no log-wealth-utility player would take even the first bet, so we chose to steelman/simplify by assuming that wealth == utility (either adjusting the gamble so that it *is* positive expected utility, or adjusting the player to have utility linear in wealth))

We think it’s pretty obvious that, if the house can fund wagers whose player-utility is unbounded (either the house has infinity money, or the player has some convenient utility function), then, yes, the house can almost surely bankrupt the player.

So, instead, consider a house that has some finite amount of money. We have a half-baked math proof ([1] [2]) that there can’t exist a way for the house to almost-surely (defined as “drive the probability of bankruptcy to above (1 - epsilon) for any given epsilon”) bankrupt the player.

Tangentially: there’s a symmetry issue here: you can just as well say “the house will eventually go bankrupt” if the house will be repeatedly playing some game with unbounded max payoff with many players. However, note that zero-sum games that neither party deems wise to play are not unheard of; risk-averse agents don’t want to play *any* zero-sum games at fair odds!

## Paper: The time resolution of the St Petersburg Paradox

This paper claims to apply Peters’s time-average (instead of ensemble-average) methods to resolve the St. Petersburg Paradox, and to derive “utility logarithmic in wealth” as a straightforward implication of the time-average reasoning he uses.

We spent about an hour trying to digest this. Unfortunately, academic math papers are often impenetrable even when they’re making correct statements using mathematical tools the reader is familiar with, so we’re not sure of our conclusions.

Optimization Process also pointed out that equation (6.6) doesn’t really make sense for a lottery where the payout is always zero.

This paper works from the assumption that the player is trying to maximize (in expectation) the exponential growth rate of their wealth. We noticed that this *is* the log-wealth-maximizer—i.e. in order to to get from “maximizes growth” to “maximizes the logarithm of wealth”, you don’t seem to actually need whatever derivation Peters’s paper is making.

# Conclusions

We still don’t understand what “the problem with expected utility” is that Peters is pointing at. It seems like expected utility with a risk-averse utility function is sufficient to make appropriate choices in the 1.5x/0.6x flip and St. Petersburg gambles.

Peters’s time-average vs. ensemble-average St. Petersburg paper either has broken math, or we don’t understand it. Either way, we’re still confused about the time- vs. ensemble-average distinction’s application to gambles.

Peters’s St. Petersburg Paradox paper does derive something equivalent to log-wealth-utility from maximizing expected growth rate, but maybe this is an elaborate exercise in begging the question by assuming “maximize expected growth rate” as the goal.

I, personally, am unimpressed by Peters’s claims, and I don’t intend to spend more brainpower investigating them.

I haven’t read the material extensively (I’ve skimmed it), but here’s what I think is wrong with the time-average-vs-ensemble-average argument and my attempt to steelman it.

It seems very plausible to me that you’re right about the question-begging nature of Peter’s version of the argument; it seems like by maximizing expected growth rate, you’re maximizing log wealth.

But I also think he’s trying to point at something real.

In the presentation where he uses the 1.5x/0.6x bet example, Peters shows how “expected utility over time” is an increasing line (this is the “ensemble average”—averaging across possibilities at each time), whereas the actual payout for any player looks like a straight downward line (in log-wealth) if we zoom out over enough iterations. There’s no funny business here—yes, he’s taking a log, but that’s just the best way of graphing the phenomenon. It’s still true that you lose almost surely if you keep playing this game longer and longer.

This is a real phenomenon. But, how do we formalize an alternative optimization criterion from it? How do we make decisions in a way which “aggregates over time rather than over ensemble”? It’s natural to try to formalize something in log-wealth space since that’s where we see a straight line, but as you said, that’s question-begging.

Well, a (fairly general) special case of log-wealth maximization is the Kelly criterion. How do people justify that? Wikipedia’s current “proof” section includes a heuristic argument which runs roughly as follows:

Imagine you’re placing bets in the same way a large number of times, N.

By the law of large numbers, the frequency of wins and losses approximately equals their probabilities.

Optimize total wealth at time N under the assumption that the frequencies equal the probabilities. You get the Kelly criterion.

Now, it’s easy to see this derivation and think “Ah, so the Kelly criterion optimizes your wealth after a large number of steps, whereas expected utility only looks one step ahead”. But, this is not at all the case. An expected money maximizer (EMM) thinking long-term will still take risky bets. Observe that (in the investment setting in which Kelly works) the EMM strategy for a single step doesn’t depend on the amount of money you have—you either put all your money in the best investment, or you keep all of your money because there are no good investments. Therefore, the payout of the EMM in a single step is some multiple C of the amount of money it begins that step with. Therefore, an EMM looking one step ahead just values its winnings at the end of the first step C more—but this doesn’t change its behavior, since multiplying everything by C doesn’t change what the max-expectation strategy will be. Similarly, two-step lookahead only modifies things by C2, and so on. So an EMM looking far ahead behaves just like one maximizing its holdings in the very next step.

The trick in the analysis is the way we replace a big sum over lots of possible ways things could go with a single “typical” outcome. This might initially seem like a mere computational convenience—after all, the vast vast majority of possible sequences have approximately the expected win/loss frequencies. Here, though, it makes all the difference, because it eliminates from consideration the worlds which have the highest weight in the EMM analysis—the worlds where things to really well and the EMM gets exponentially much money.

OK, so, is the derivation just a mistake?

I think many english-language justifications of the Kelly criterion or log-wealth maximization are misleading or outright wrong. I don’t think we can justify it as an analysis of the best long-term strategy, because the analysis rules out any sequence other than those with the most probable statistics, which isn’t a move motivated by long-term analysis. I don’t think we can even justify it as “time average rather than ensemble average” because we’re not time-averaging wealth. Indeed, the whole point is supposedly to deal with the non-ergodic cases; but non-ergodic systems don’t have unique time-averaged behavior!

However, I ultimately find something convincing about the analysis: namely, from an evolutionary perspective, we expect to eventually find that only (approximate) log-wealth maximizers remain in the market (with non-negligible funds).

This conclusion is perfectly compatible with expected utility theory as embodied by the VNM axioms et cetera. It’s an argument that market entities will tend to have utility=log(money), at least approximately, at least in common situations which we can expect strategies to be optimized for. More generally, there

mightbe an argument that evolved organisms will tend to have utility=log(resources), for many notions of resources.However, maybe Nassim Nicolas Taleb would rebuke us for this tepid and timid conclusion. In terms of pure utility theory, applying a log before taking an expectation is a distinction without a difference—we were allowed any utility function we wanted from the start, so requiring an arbitrary transform means nothing. For example, we can “solve” the St. Petersburg paradox by claiming our utility is the log of money—but we can then re-create the paradox by putting all the numbers in the game through an exponential function! So what’s the point? We should learn from our past mistakes, and choose a framework which won’t be prone to those same errors.

So, can we steelman the claims that expected utility theory is wrong? Can we find a decision procedure which is consistent with the Peters’ general idea, but isn’t just log-wealth maximization?

Well, let’s look again at the kelly-criterion analysis. Can we make that into a general-purpose decision procedure? Can we get it to produce results incompatible with VNM? If so, is the procedure at all plausible?

As I’ve already mentioned, there isn’t a clear way to apply the law-of-large-numbers trick in non-ergodic situations, because there is not a unique “typical” set of frequencies which emerges. Can we do anything to repair the situation, though?

I propose that we maximize median expected value. This gives a notion of “typical” which does not rely on an application of the law of large numbers, so it’s fine if the statistics of our sequence don’t converge to a single unique point. If they do, however, the median will evaluate things from that point. So, it’s a workable generalization of the principle behind Kelly betting.

The median also relates to something mentioned in the OP:

The median is the 50th percentile, so there you go.

Maximizing the median indeed violates VNM:

It’s discontinuous. Small differences in probability can change the median outcome by a lot. Maybe this isn’t so bad—who really cares about continuity, anyway? Yeah, seemingly small differences in probability create “unjustified” large differences in perceived quality of a plan, but only in circumstances where outcomes are sparse enough that the median is not very “informed”.

It violates independence, in a more obviously concerning way. A median-maximizer doesn’t care about “outlier” outcomes. It’s indifferent between the following two plans, which seems utterly wrong:

A plan with 100% probability of getting you $100

A plan with 60% probability of getting you $100, and 40% probability of getting you killed.

Both of these concerns become negligible as we take a long-term view. The longer into the future we look, the more outcomes there will be, making the median more robust to shifting probabilities. Similarly, a median-maximizer is indifferent between the two options above, but if you consider the iterated game, it will strongly prefer the global strategy of always selecting the first option.

Still, I would certainly not prefer to optimize median value myself, or create AGI which optimizes median value. What if there’s a one-shot situation which is similar to the 40%-death example? I think I similarly don’t want to maximize the 99th percentile outcome, although this is less clearly terrible.

Can we give an evolutionary argument for median utility, as a generalization of the evolutionary argument for log utility? I don’t think so. The evolutionary argument relies on the law of large numbers, to say that we’ll almost surely end up in a world where log-maximizers prosper. There’s no similar argument that we almost surely end up in the “median world”.

So, all told:

I don’t think there’s a good argument against expectation-maximization here.

But I do think those who think there is should consider median-maximization, as it’s an alternative to expectation-maximization which is consistent with much of the discussion here.

I basically buy the argument that utility should be log of money.

I don’t think it’s right to describe the whole thing as “time-average vs ensemble-average”, and suspect some of the “derivations” are question-begging.

I do think there’s an evolutionary argument which can be understood from some of the derivations, however.

I now like the “time vs ensemble” description better. I was trying to understand everything coming from a Bayesian frame, but actually, all of these ideas are more frequentist.

In a Bayesian frame, it’s natural to think directly in terms of a decision rule. I didn’t think time-averaging was a good description because I didn’t see a way for an agent to directly replace ensemble average with time average, in order to make decisions:

Ensemble averaging is the natural response to decision-making under uncertainty; you’re averaging over different

possibilities.When you try to time-average to get rid of your uncertainty, you have to ask “time averagewhat?”—you don’t know what specific situation you’re in.In general, the question of how to turn your current situation into a repeated sequence for the purpose of time-averaging analysis seems under-determined (even if you are certain about your present situation). Surely Peters doesn’t want us to use

actualtime in the analysis; in actual time, you end up dead and lose all your money, so the time-average analysis is trivial.Even if you settle on a way to turn the situation into an iterated sequence, the necessary limit does not necessarily exist. This is also true of the possibility-average, of course (the St Petersburg Paradox being a classic example); but it seems easier to get failure in the time-avarage case, because you just need non-convergence; ie, you don’t need any unbounded stuff to happen.

However, all of these points are also true of frequentism:

Frequentist approaches start from the objective/external perspective rather than the agent’s internal uncertainty. They don’t want to define probability as the subjective viewpoint; they want probability to be defined as limiting frequencies if you repeated an experiment over and over again. The fact that you don’t have direct access to these is a natural consequence of you not having direct access to objective truth.

Even given direct access to objective truth, frequentist probabilities are still under-defined because of the

reference class problem—what infinite sequence of experiments do you conceive of your experiment as part of?And, again, once you select a sequence, there’s no guarantee that a limit exists. Frequentism has to solve this by postulating that limits exist for the kinds of reference classes we want to talk about.

So, I now think what Ole Peters is working on is

frequentist decision theory. Previously, the frequentist/Bayesian debate was about statistics and science, but decision theory was predominantly Bayesian. Ole Peters is working out the natural theory of decision making which frequentists could/should have been pursuing. (So, in that sense, it’s much more than just a new argument for kelly betting.)Describing frequentist-vs-Bayesian as time-averaging vs possibility-averaging (aka ensemble-averaging) seems perfectly appropriate.

So, on my understanding, Ole’s response to the three difficulties could be:

We first understand the optimal response to an objectively defined scenario; then, once we’ve done that, we can concern ourselves with the question of how to actually behave given our uncertainty about what situation we’re in. This is not trying to be a universal formula for rational decision making in the same way Bayesianism attempts to be; you might have to do some hard work to figure out enough about your situation in order to apply the theory.

And when we design general-purpose techniques, much like when we design statistical tests, our question should be whether

given an objective scenariothe decision-making technique does well—the same as frequentists wanting estimates to be unbiased. Bayesians want decisions and estimates to be optimalgiven our uncertaintyinstead.As for how to turn your situation into an iterated game, Ole can borrow the frequentist response of not saying much about it.

As for the existence of a limit, Ole actually says quite a bit about how to fiddle with the math until you’re dealing with a quantity for which a limit exists. See his lecture notes. On page 24 (just before section 1.3) he talks briefly about finding an appropriate function of your wealth such that you can do the analysis. Then, section 2.7 says much more about this.

The general idea is that you have to choose an analysis which is appropriate to the dynamics. Additive dynamics call for additive analysis (examining the time-average of wealth). Multiplicative dynamics call for multiplicative analysis (examining the time-average of growth, as in kelly betting and similar settings). Other settings call for other functions. Multiplicative dynamics are common in financial theory because so much financial theory is about investment, but if we examine financial decisions for those living on income, then it has to be very different.

Thanks for taking the time to delve into this!

You note that expected utility with a risk-averse utility function is sufficient to make appropriate choices [in those particular scenarios].

This is a slight tangent, but I’m curious to what extent you think people actually follow something that approximates this utility function in real life? It seems like some gamblers instinctively use a strategy of this nature (e.g. playing with house money) or explicitly run the numbers (e.g. the Kelly criterion). And I doubt that anyone is dumb enough to keep betting their entire bankroll on a positive EV bet until they inevitably go bust.

But in other cases (like retirement planning, as you mentioned) a lot of people really do seem to make the mistake of relying on ensemble-average probabilities. Some of them will get burned, with much more serious consequences than merely making a silly bet at the casino.

I guess what I’m asking is: even if Peters et al are wrong about expected utility, do you think they’re right about the dangers of failing to understand ergodicity?

Not sure. I can’t tell what additional information, if any, Peters is contributing that you can’t already get from learning about the math of wagers and risk-averse utility functions.

It seems to me like it’s right. So far as I can tell, the “time-average vs ensemble average” argument doesn’t really make sense, but it’s still true that log-wealth maximization is a distinguished risk-averse utility function with especially good properties.

Idealized markets will evolve to contain only Kelly bettors, as other strategies either go bust too often or have sub-optimal growth.

BUT, keep in mind we don’t live in such an idealized market. In reality, it only makes sense to use this argument to conclude that financially savvy people/institutions will be approximate log-wealth maximizers—IE, the people/organizations with a lot of money. Regular people might be nowhere near log-wealth-maximizing, because “going bust” often doesn’t literally mean dying; you can be a failed serial startup founder, because you can crash on friends’/parents’ couches between ventures, work basic jobs when necessary, etc.

More generally, evolved organisms are likely to be approximately log-resource maximizers. I’m less clear on this argument, but the situation seems analogous. It therefore may make sense to suppose that humans are approximate log-resource maximizers.

(I’m not claiming Peters is necessarily adding anything to this analysis.)

(I’ve only spent several hours thinking about this, so I’m not confident in what I say below. I think Ole Peters is saying something interesting, although he might not be phrasing things in the best way.)

Time-average wealth maximization and utility=log(wealth) give the same answers for multiplicative dynamics, but for additive dynamics they can prescribe different strategies. For example, consider a game where the player starts out with $30, and a coin is flipped. If heads, the player gains $15, and if tails, the player loses $11. This is an additive process since the winnings are added to the total wealth, rather than calculated as a percentage of the player’s wealth (as in the 1.5x/0.6x game). Time-average wealth maximization asks whether (15−11)/2>0, and takes the bet. The agent with utility=log(wealth) asks whether (log(30+15)+log(30−11))/2>log30, and refuses the bet.

What happens when this game is repeatedly played? That depends on what happens when a player reaches negative wealth. If debt is allowed, the time-average wealth maximizer racks up a lot of money in almost all worlds, whereas the utility=log(wealth) agent stays at $30 because it refuses the bet each time. If debt is not allowed, and instead the player “dies” or is refused the game once they hit negative wealth, then with probability at least

^{1}⁄_{8}, the time-average wealth maximizer dies (if it gets tails on the first three tosses), but when itdoesn’tmanage to die, it still racks up a lot of money.In a world where this was the “game of life”, the utility=log(wealth) organisms would soon be out-competed by the time-average wealth maximizers that happened to survive the early rounds. So the organisms that tend to evolve in this environment will have utility linear in wealth.

So I understand Ole Peters to be saying that time-average wealth maximization adapts to the game being played, in the sense that organisms which follow its prescriptions will tend to out-compete other kinds of organisms.

Tangentially: reading about the history of gambling theory (the “unfinished game” problem, etc.) is pretty interesting.

Imagine how weird it was when people basically didn’t understand expected value at all! Did casinos even know what they were doing, or did they somewhat routinely fail after picking the wrong game design? Did they only settle on profitable designs by accident? Are blackjack, roulette, and other very old games still with us because they happened not to bankrupt casinos that ran them, and were only later analyzed with tools capable of identifying whether the house had the edge?

1. Something like MVP. Don’t start by throwing a brand new game out there—even if you have the edge in the game, you have to get people to play it. Getting the stuff for a new game + advertising costs money. Test it out a little (small scale). If you lose money testing it*, you paid a little bit of money to find out you’d have lost a lot of money if you’d tried it out big. (More naturally—big companies are at times known for staying the same, with startups coming in with new ideas. If you copy ideas from other people that haven’t bankrupted them...)

2. It seems like it’s possible to make it by on “this is unlikely” or setting things up so you always win. (I notice snake eyes doesn’t come up a lot. (Perhaps I check this by rolling dice a bunch of times.))

Simplest case: you buy a place, and pool equipment, and you rent it out to people. If they make bets with each other on the outcome, you don’t care—they’re just paying you so they can play pool.

Slightly more complicated: you offer to handle the betting on the game. People pay you a little to be able to bet (and later, to win big), but the money all comes from them, and you don’t care who wins—you make money off people playing, people watching, and people betting!

3. Were casinos a thing before probability was understood?

*One game night with a few people, maybe you and your friends? If you have people who are happy to try out a new game, without real money, (For Free! perhaps?), that’s a place to start initially—and all you lose is the time to run it. If you have fun, then maybe that’s a small price to pay. And if people are willing to pay to play a game with fake money, then you can just print more monopoly money if you run out—no odds calculation needed for a sure bet.

This seems to assume the people who did the origination here were casinos or explicit entrepreneurs, instead of people who started gambling informally and then started with some sense of which games had which payoffs.

(Or rather, maybe you’re explicitly not assuming that and that’s your point. But the way I’d make the same point you seem to be making here is not “they operated like a startup” and more like “they operated like a group of friends/rivals/communities incrementally experimenting, and by the time someone considered starting an explicit business, good gamblers had some intuitive sense of how games worked.”)

Yeah, there’s reverse causality in assuming purpose—I wrote to explain how the reader could make such a thing intentionally without resorting to “entrepreneurs gambled by starting casinos and pseudo-darwinian survival of the business whose games don’t lose them money led to the casinos of today”. This is probably a side effect of my constructionist tendencies. (I feel like the points I came up with in 5 minutes, which don’t reference odds, are within the imagination of a business owner whose livelihood is at stake.)

a) That was in the footnote, and point 2, respectively, though you put it way more clearly. b) I suggested the possibility that they could arise without doing odds at all, or even starting

notwith games of chance. c) I would further note that being a casino and “incrementally experimenting”/‘operating like a group of friends’ need not be incompatible—consider a game store. You buy a new game. If it’s not popular, you lose a little. If it’s really popular you buy a lot more.Peters’ December 2019 Nature Physics paper (https://www.nature.com/articles/s41567-019-0732-0 ) provides some perspective on 0.6/1.5x coin flip example and other conclusions of the above discussion. (If Peters’ claims have changed along the way, I wouldn’t know.)

In my reading, there Peters’ basic claim is not that ergodicity economics can solve the coin flip game in a way that classical economics can not (because it can, by switching to expected log wealth utility instead of expected wealth), but the utility functions as originally presented are a clutch that misinforms us on people’s psychological motives in doing economic decisions. So, while the mathematics of many parts stays the same, the underlying phenomena can be more saliently reasoned about by looking at the individual growth rates in context of whether the associated wealth “process” is additive or multiplicative or something else. Thus there is also less need to use lingo where people may have an (innate, weirdly) “risk-averse utility function” (as compared to some other less risk-averse theoretical utility function).