Why Bet Kelly?

The Kelly criterion is an elegant, but often misunderstood, result in decision theory. To begin with, suppose you have some amount of some resource, which you would like to increase. (For example, the resource might be monetary wealth.) You are given the opportunity to make a series of identical bets. You determine some fraction $f$ of your wealth to wager; then, in each bet, you gain a fraction $f$ with probability $p$ , and lose a fraction $f$ with probability $(1 - p)$ .^[1]

In other words, suppose $W_{n}$ is your wealth after $n$ bets. We will define $Z_{n} = log W_{n}$ , and we will suppose for simplicity that $Z_{0} = 0$ . Then $Z_{n} = \sum_{t = 1}^{n} R$ , where $R$ is a random variable defined as:

R = {\begin{matrix} log (1 + f) & with probability p log (1 - f) & with probability (1 - p) \end{matrix}

Now suppose that, for some reason, we want to maximize $E [Z_{n}]$ . By linearity of expectation, $E [Z_{n}] = \sum_{t = 1}^{n} E [R]$ . Hence, we should simply maximize $E [R]$ . This amounts to solving:

\begin{matrix} 0 & = & \frac{\partial}{\partial f} E [R] 0 & = & \frac{\partial}{\partial f} [p log (1 + f) + (1 - p) log (1 - f)] 0 & = & (1 - f) p - (1 - p) (1 + f) f & = & 2 p - 1 \end{matrix}

This, $f = 2 p - 1$ , is known as the Kelly bet. For example, it says that if you have a 60-40 edge, then you should bet $f = 2 (0.6) - 1 = 0.2$ , i.e., bet $20 %$ of your current wealth on each bet.

That all seems pretty reasonable. But why do we want to maximize $E [Z_{n}]$ ? If we were to simply maximize expected wealth, i.e., $E [W_{n}]$ , then a straightforward calculation shows that we should not bet Kelly—in fact, we should bet $f = 1$ (“YOLO”), wagering the entire bankroll on every bet. This seems extremely counterintuitive, since, after $n$ bets, our wealth would then be:

W_{n} = {\begin{matrix} 0 & with probability 1 - p^{n} 2^{n} & with probability p^{n} \end{matrix}

In other words, as $n$ grows large, we would almost surely go bankrupt! Nevertheless, this would be the way to maximize $E [W_{n}]$ . Kelly, whatever its merits, does not maximize $E [W_{n}]$ -- not even in the long run. Especially not in the long run.

We now come to the perennial debate: why does Kelly seem “obviously right”, and YOLO “obviously wrong”? There are many answers usually offered to this question.

First, what we believe to be the correct answer:

Utility is not linear in wealth. As originally observed by Bernoulli, utility tends to be approximately logarithmic in wealth. If utility happens to be exactly logarithmic in wealth, then the Kelly bet is optimal. For most people, in most circumstances, utility is approximately logarithmic in wealth. The Kelly bet is approximately optimal. On the other hand, utility is very far from being linear in wealth, and so YOLO is a very bad idea.

In a certain sense, it is as simple as that. The von Neumann-Morgenstern utility theorem (vNM) tells us that we should be optimizing $E [U]$ for some utility function $U$ . We know that the Kelly criterion always optimizes $E [Z_{n}] = E [log W_{n}]$ . Therefore, if the Kelly criterion is optimal, it is because $U = log W_{n}$ .

Now, there are many other answers to “why bet Kelly?” that initially seem plausible:

Kelly maximizes the expected growth rate, ${lim}_{n \to \infty} E [W_{n}^{1 / n}]$ . This happens to be true, and speaks to the elegance of Kelly’s result.^[2] However, unless for some reason you find yourself in a contest where you only win the prize if you have the highest expected growth rate, this is not a good reason to bet Kelly. vNM says we should maximize expected utility, not maximize expected growth rate.
Kelly maximizes the geometric mean of wealth, $M = \prod_{v} v^{P r [W_{n} = v]}$ . This is also evidently true, as $log M$ is precisely $E [log W_{n}] = E [Z_{n}]$ . However, vNM says we should maximize expected utility (i.e., arithmetic mean of utility), not geometric mean of wealth. Again, if utility happens to be approximately logarithmic in wealth, then maximizing the geometric mean of wealth feels right, but it’s because of the logarithmic utility of wealth.
The Kelly bettor, with high probability, ends up with higher wealth than the non-Kelly bettor. This is particularly evident when Kelly is compared with YOLO. But, again, vNM does not say “maximize wealth with high probability”; it says “maximize expected utility”.
We should try to optimize something that has nice properties (e.g., can be time-averaged, or can be optimized myopically [Mossin, 1968], [Hakansson, 1971]). There is certainly an argument that, if our utility function happens to already be approximately logarithmic, then we might want to adopt logarithmic utility as a heuristic, since it has these nice properties. However, ultimately our true utility function is what it is. If we claim that Kelly is optimal, and we claim that our true utility function is not logarithmic in wealth, then we are rejecting vNM.
We should simply reject vNM, and optimize something else as a terminal value (e.g., geometric mean or maximin). This seems quite drastic, as the vNM axioms are very mild assumptions.

So, we claim, if Kelly is optimal then it is because our utility function is $U = log W_{n}$ . However, this is not the whole story. The utility function $U$ refers to the utility of wealth at the moment after the betting experiment, not the terminal utility of wealth in general. We can imagine that this experiment is just the preamble to a much longer game, in which $U_{T}$ is the ultimate terminal value of wealth (e.g., in number of lives saved), and we are investing over $T$ time steps where, in each step, we have the opportunity to place a bet with some statistical edge $p : (1 - p)$ . We can then use backward induction to determine the utility function that we should adopt for wealth at previous points in the game: $U_{T - 1}, U_{T - 2}, \dots, U_{0}$ . It is this final function, $U_{0} (W)$ , that we should treat as our “utility function” in the preamble experiment.

Now, suppose we ultimately have something like this as our terminal utility function:

U_{T} (W) = {\begin{matrix} W & if W < C C & otherwise \end{matrix}

In other words, number-of-lives-saved is linear in money up to a certain point, then flat—an exaggerated version of the phenomenon of diminishing returns. As it turns out, when we apply backward induction for reasonably large values of $T$ (e.g., $T = 100$ ) and modest statistical edge (e.g., $p = 0.55)$ , we obtain a preamble utility function $U_{0} (W)$ that looks something like this (taking $C = 1$ for simplicity):

In general, this function “looks more like a logarithm” than the piecewise-linear function $U_{T}$ , and falls off sharply as we approach zero. Clearly it is not actually a logarithm, as it is bounded above and below (and is, in fact, equal to $1$ for values $W \geq 1$ ). But, for a broad class of terminal utility functions $U_{T}$ , the resulting function $U_{0}$ looks surprisingly logarithm-like.

In summary, the Kelly criterion is an elegant, and surprisingly simple, formula for optimizing $E [log W]$ . As a general strategy, optimizing $E [log W]$ is appealing in a number of ways:

It has many aesthetically appealing properties: it maximizes geometric growth rate; it maximizes the geometric mean over outcomes; it results in outperforming other bettors with high probability; and it is stable in the sense of Mossin and Hakansson.
Separately, $log W$ is often a good approximation to the true expected utility of money-after-the-bet, if the scenario specifies a long series of subsequent opportunities to make bets. When we examine the instrumental utility function $U_{0}$ that arises from applying backward induction to such a series of opportunities, we find that it often “looks like a logarithm”.

However, we should remember that the Kelly bet, ultimately, is only an approximation. The true optimal bet—the one that actually maximizes expected utility $E [U_{T}]$ -- may be significantly different, in either direction.

Acknowledgements: We would like to thank davidad for many helpful comments on earlier drafts of this article.

^
Note that some definitions of the Kelly betting experiment are slightly more complicated, as they presume that one wins $b f$ with probability $p$ and loses $a f$ with probability $(1 - p)$ . In this document, for simplicity, we take $a = b = 1$ .
^
To show this, note that ${lim}_{n \to \infty} \frac{1}{n} \sum_{t = 1}^{n} R = δ (E [R])$ , and hence ${lim}_{n \to \infty} W_{n}^{1 / n} = {lim}_{n \to \infty} exp (\frac{1}{n} \sum_{t = 1}^{n} R) = δ (exp E [R])$ , whose expectation is maximized when we maximize $E [R]$ .