Never Go Full Kelly

SimonM25 Feb 2021 12:53 UTC

55 points

Introduction

I am assuming the reader is at least somewhat familiar with the Kelly Criterion. If not, I suggest you read this first. There is a sense in which I am mostly just adding meat to the bones of “Never Go Full Kelly”. (Or more narrowly his two last two points):

You can’t handle the swings
You have full knowledge of your edge

Hopefully I can give you some rules of thumb for how to handle both those issues:

Scale down all bets “fractionally”
Scale down by these specific fractions:
- A function of your risk-adjusted edge (1-1/(1+SR^2))
- A function of how much information is in your prediction vs your counterparty (your “information” / the “information” between you)
You want to do both if you can’t handle the swings AND don’t have full knowledge. (Which is basically everyone).

For this post, $f$ is the Kelly fraction, $p$ is your estimate of the probability of an event, $q$ is the market / your counterparty’s probability. In this notation:

$f = \frac{p - q}{1 - q}$

After being introduced to the Kelly criterion the next thing people are introduced to is “fractional Kelly”. Specifically the idea is rather than wagering the Kelly fraction $f$ on an event, you wager $α f$ (typically with $0 < α < 1$ ).

What’s wrong with Kelly?

Kelly is pretty aggressive compared to most people’s risk tolerance. Upon seeing a Kelly bet some people will be somewhat dubious as to how aggressive it is. If someone offers you evens on something you think is 60%, Kelly would have you betting 20% of your net worth. 20% seems like an awful lot to bet on something which is only going to happen 60% of the time. You can play this game yourself here and see how comfortable you are with the drawdowns with “fake” money. (This was originally from a paper where the participants did not perform well). If you don’t fancy playing, consider this thought experiment: 3 tails is a 50% draw down, and this will happen ~once every 24 flips.

To give a more recent example, before the 2020 Presidential election, the betting markets were giving Biden a 60% change. If you believed 538, and put it at ~90%, would you have been comfortable staking 50% of your net worth on Biden?

Let’s have a look at what the geometric growth rate looks like for different fractions of a Kelly bet. Growth is maximised at Full Kelly:

This is plotting fractional-Kelly coefficient ( $α$ ) (1 is Full Kelly, 0.5 is half Kelly etc) vs expected growth rate (betting with a 2% edge into a 50% market).

There’s nothing special about “2” here. Using a different set-up (30% edge into 50% market, (f = .6) so you can’t bet more than $α = 1.66 \dots$ ) we have a chart that looks like:

So far, we’ve been assuming that we know the probability of an event occurring, so we can perfectly balance on the peak of that curve. Now lets assume there’s some uncertainty in that. (Bayesians might get a little uncomfortable here—posterior distributions for discrete events are point estimates. Instead imagine you view the event as a Bernoulli random variable with parameter p, and you have a posterior distribution for p.*).

The peak of that curve looks vaguely symmetric, so you could be forgiven for thinking that you get penalised equally for being over- or under-confident. (And therefore if your posterior distribution for p is symmetric, betting Fully Kelly is optimal). Unfortunately this is not true. The heuristic argument for this takes a little but of algebra, but roughly stated, you can check that the derivative of the growth rate with respect to $α$ at $α = 1$ plus-or-minus a small amount is negative, therefore we’ve already passed the “optimal” point. To use a more concrete example:

Baker and McHale model the posterior of p as a Beta-distribution. How does our growth-rate vs Kelly fraction change with increasing uncertainty?

Expected utility of the optimum Kelly bet shrunk by the shrinkage coefficient k, when b= 1,p= 0.7. The curves have= sigma = 0.05;0.1;0.15;0.2;0.25 (lowest at top).

First thing to note: growth rate is always sloping down at $k = 1 = α$ . ie Full Kelly where there is any uncertainty is not optimal!

They then go on to describe (in their model) how to find the correct “shrinkage coefficient” (what I call $α$ ) as a function of uncertainty;

$α = \frac{(p - q)^{2}}{(p - q)^{2} + σ^{2}} = \frac{S R^{2}}{S R^{2} + 1}$

I think this is a pretty decent rule of thumb, which is worth keeping in mind. (We’ll come back to another interpretation of this later). (SR = Sharpe Ratio = “Edge divided by standard deviation”). “The higher your (risk-adjusted) edge, the closer to Full Kelly you can go”.

* Yes, if you had your posterior distribution and your utility you can happily go off and optimize to your hearts content. I am mostly aiming this at people who only have a vague sense for what their uncertainty and utility are.

Respecting the market / your counterparty

Moving away from thinking about our posterior in isolation, we’ll consider a world in which we are now being presented with a bet. Let’s start by assuming that the market (or Alice who we’re betting against) has information we don’t. Let’s also assume that we have information the market (Alice) doesn’t. (Otherwise we really shouldn’t be betting).

When betting, we should update our probability upon seeing the market (or our counterparty’s) odds. One way to do this update is to create a weighted average of $p, q$ weighted by how much “information” (I’m being pretty loose with terminology here) they contain. ( $α = 1$ , we know everything the market does (and something else); $α = 0$ , the market knows everything we do (and something else) - we will take their probability as our own; $α$ somewhere in-between, we have some information the market is missing and they have some we don’t have. I’ll make this more precise with a toy model to give more intuition).

$^p (p, q) = α p + (1 - α) q$ .

In this instance, our new Kelly fraction is fractional Kelly.

$^f = \frac{^p - q}{1 - q} = \frac{α \cdot p - α \cdot q}{1 - q} = α (\frac{p - q}{1 - q}) = α f$

Some properties of $α$ : which we might expect:

0≤α≤1
α depends on our relative confidence of the two forecasts [Again, not always true, but true in some model]

To give a concrete example of this, in his 2020 Presidential Market postmortem Zvi says:

If I was allowed to look at both the models and markets, and also at the news for context, I would combine all sources. If the election is still far away, I would give more weight to the market, while correcting for its biases. The closer to election day, the more I would trust the model over the polls. In the final week, I expect the market mainly to indicate who the favorite is, but not to tell me much about the degree to which they are favored.
If FiveThirtyEight gives either side 90%, and the market gives that side 62%, in 2024, my fair will again be on the order of 80%, depending on how seriously we need to worry about the integrity of the process at that point.
The more interesting scenario for 2024 is FiveThirtyEight is 62% on one side and the market is 62% on the other, and there is no obvious information outside of FiveThirtyEight’s model that caused that divergence. My gut tells me that I have that one about 50% each way.

Digression on $0 < α < 1$

There are cases where two people coming to independent conclusions of a probability means the best estimate of the probability is higher. If our evidence Bob is the killer is a bloody sock we might give a probability 40%, if Alice’s evidence Bob is the killer is a conversation she overheard she might have a probability of 20%. Upon hearing Alice’s data, we might update to higher than 40%.

This becomes a little trickier when dealing with betting. If all we know from Alice is she is willing to bet at 20% it’s hard for us to gauge what information she has that we don’t. We’d need to infer it. (Perhaps 20% is much higher than the base-rate, she can’t know about the sock / our evidence and so she must have some evidence we don’t have). The same is also true when dealing with a faceless market. Betting markets don’t announce what’s priced in (and often figuring out what the market has priced into an event is the whole game).

… back to estimating our posterior.

To go back to our earlier model we’re looking at a Bernoulli event and we have a posterior for P. Suppose also the market has a posterior for P and they are both normal with known variances:

$f_{p} \sim N (P, σ_{m e}^{2}), f_{q} \sim N (P, σ_{m a r k e t}^{2})$

(This wont be exactly true, probabilities need to be positive and less than one, but assuming the $σ$ are small enough this is good enough. We could also do this in log-odds space or similar if needs be) Given $p, q, σ_{m e}, σ_{m a r k e t}$ then our best estimate / Bayesian posterior for P is:

$^p = \frac{1}{σ_{m e}^{2} + σ_{m a r k e t}^{2}} (σ_{m a r k e t}^{2} \cdot p + σ_{m e}^{2} \cdot q) = α \cdot p + (1 - α) \cdot q$

$α = \frac{σ_{m a r k e t}^{2}}{σ_{m e}^{2} + σ_{m a r k e t}^{2}}$

This is exactly our model from before with $α$ as our fraction. How to read alpha? “How much of the uncertainty in P is being captured by our estimate vs their estimate?”

Clearly this toy model isn’t perfect, I would encourage you to play around with some other examples. The general sentiment is as follows:

You probably believe that the market contains some information you missed. Use a Kelly fraction which accounts for your humility.
Full Kelly—I know everything the market / Alice does. (And my edge is precisely what they don’t know)
¹⁄₂ Kelly—I know as much as the market (but we know different things, otherwise our estimates would be the same)
¹⁄₃ Kelly—I know something the market doesn’t know which is ~1/2 as important as what the market knows.
...
0 Kelly—I know nothing the market doesn’t already know, and I’ve updated my estimate

Linking this back to the Baker-McHale formula:

$\frac{{edge}^{2}}{{edge}^{2} + σ^{2}} = \frac{σ_{m a r k e t}^{2}}{σ_{m a r k e t}^{2} + σ^{2}}$

“Our edge is the market error, our error is the market’s edge”

Fractional Kelly as risk reduction

So far, everything I’ve suggested only really applies to Full Kelly. I’ve suggested how to reduce the Kelly fraction, but only as a means to achieving a more optimal full Kelly. I think it’s worth reconsidering the more risk averse people we discussed earlier.

I mostly want to spend this section vaguely gesturing towards the literature on fractional Kelly without giving too much detail. In part this is because I think there’s nothing especially persuasive here. Risk tolerance is a personal thing so saying things like “fractional Kelly is optimal in certain circumstances for a certain utility” isn’t going to persuade anyone. That said, there are lots of cases in which fractional Kelly is optimal, and I think it’s worth knowing that. There are also a bunch of properties similar to “Kelly maximises long-run median wealth” which aren’t specifically utility related, but might encourage you to think a little bit about how you calibrate your risk taking.

Fractional Kelly is Mean-Variance optimal

(This is another way of stating the 2-fund theorem in finance)

Given a trade-off between maximising returns (equivalently $log$ (wealth)) and for a specific variance of returns, the optimal strategy is a linear combination of the Kelly-strategy and the “hold cash” strategy.

MacLean, Ziemba, Li give a proof of this in the Kelly framework.
Wikipedia give a proof using the standard Lagrangian set-up (equivalent to MacLean et al)
Hansen and Richard give a more geometric proof

[The discrete time analogy to this is sadly trivial. There’s only one bet to be made, so once you choose your variance, your strategy is always going to be a multiple of the Kelly criterion.]

Fractional Kelly is CRRA Optimal in certain situations

The Constant Relative Risk Aversion utilities are popular in economics (probably because they are tractable). I think (since this post is mostly rules-of-thumb) it’s not unreasonable to see what results come of them and adjust ourselves accordingly. Calculating the optimal bet using CRRA utilities is fairly straightforward in some circumstances. (Discrete bets and lognormal bets). For the later, fractional Kelly is optimal.

Fractional Kelly efficiently trades off “growth” against “security”

(This is mostly a summary of Growth Versus Security in Dynamic Investment Analysis and some extensions of the results in MacLean, Ziemba, Li)

Given growth goals:

Expected wealth at a fixed time (I believe this is also true for any utility of wealth, although the algebra is pretty hairy)
Expected growth rate
Expected time to achieve given wealth level U

and security goals:

Probability of having at least a given wealth at time t
Probability wealth is above a specified path
Probability of achieving a wealth U without drawing down to wealth W

The first paper shows that fractional Kelly is monotonic in each of these. That is for any increase in growth it is being traded off against an increase in security (and doing so in a convex way). For lognormal assets, the second paper shows that fractional Kelly is optimal. There is a fair amount of handwaving in the literature about how close to optimal fractional Kelly is when we’re not in a lognormal world. It generally appears to be “quite”.

One way to figure out your Kelly fraction

I am not one of these people who sees massive betting opportunities on a daily basis. (But if you are—let me know, I’m open to seeing any opportunities I’m missing). I think for most people, “true” Kelly betting opportunities come up fairly rarely, but they are the best opportunities to find out what Kelly fraction you are comfortable with:

How much did you bet on Biden? What did you think the fair odds were?
How much did you bet on Mayweather-McGregor?

Conclusion

Never Go Full Kelly
Adjust your posterior for information the market has
Adjust your Kelly fraction by your posterior uncertainty
- $\frac{{edge}^{2}}{{edge}^{2} + σ^{2}} = \frac{σ_{m a r k e t}^{2}}{σ_{m a r k e t}^{2} + σ^{2}}$
- “Our edge is is market error; market edge is our error”
Consider how aggressive Full Kelly is, and if that’s truly your risk appetite

Thanks to gjm and johnswentworth for comments on earlier versions. I’m still not fully happy with this, but it’s much better than it was. (Improvements are all due to them, issues are all me).

What links here?