# abramdemski comments on Never Go Full Kelly

• Thanks for writing this!

Just to be pedantic, I wanted to mention: if we take Fractional Kelly as the average-with-market-beliefs thing, it’s actually full Kelly in terms of our final probability estimate, having updated on the market :)

Concerning your first argument, that uncertainty leads to fractional Kelly—is the idea:

1. We have a probability estimate , which comes from estimating the true frequency ,

2. Our uncertainty follows a Beta distribution,

3. We have to commit to a fractional Kelly strategy based on our and never update that strategy ever again

?

So the graph shows what happens if we take our uncertainty and keep it as-is, not updating on data, as we continue to update?

Or is it that we keep updating (and hence reduce our uncertainty), but nonetheless, keep our Kelly fraction fixed (so we don’t converge to full Kelly even as we become increasingly certain)?

Also, I don’t understand the graph. (The third graph in your post.) You say that it shows growth rate vs Kelly fraction. Yet it’s labeled “expected utility”. I don’t know what “expected utility” means, since the expected utility should grow unboundedly as we increase the number of iterations.

Or maybe the graph is of a single step of Kelly investment, showing expected log returns? But then wouldn’t Kelly be optimal, given that Kelly maximizes log-wealth in expectation, and in this scenario the estimate is going to be right on average, when we sample from the prior?

Anyway, I’m puzzled about this one. What exactly is the take-away? Let’s say a more-or-less Bayesian person (with uncertainty about their utilities and probabilities) buys the various arguments for Kelly, so they say, “In practice, my utility is more or less logarithmic in cash, at least in so far as it pertains to situations where I have repeated opportunities to invest/​bet”.

Now lets assume there’s some uncertainty in that. (Bayesians might get a little uncomfortable here—posterior distributions for discrete events are point estimates. Instead imagine you view the event as a Bernoulli random variable with parameter p, and you have a posterior distribution for p.*).

BAYESIAN: Wait, what? I agree I’ll have parameter uncertainty. But we’ve already established that my utility is roughly logarithmic in money. My point estimate (my posterior) for this gamble paying off is . The optimal bet under these assumptions is Kelly. So what are you saying? Perhaps you’re arguing that my best-estimate probability isn’t really .

OTHER: No, is really your best-estimate probability. I’m pointing to your model uncertainty.

BAYESIAN: Perhaps you’re saying that my utility isn’t really logarithmic? That I should be more risk-averse in this situation?

OTHER: No, my argument doesn’t involve anything like that.

BAYESIAN: So what am I missing? Log utility, probability , therefore Kelly.

OTHER: Look, one of the ways we can argue for Kelly is by studying the iterated investment game, right? We look at the behavior of different strategies in the long term in that game. And we intuitively find that strategies which don’t maximize growth (EG the expected-money-maximizer) look pretty dumb. So we conclude that our values must me closer to the growth-maximizer, ie Kelly, strategy.

BAYESIAN: Right; that’s part of what convinced me that my values must be roughly logarithmic in money.

OTHER: So all I’m trying to do is examine the same game. But this time, rather than assuming we know the frequency of success from the beginning, I’m assuming we’re uncertain about that frequency.

BAYESIAN: Right… look, when I accepted the original Kelly argument, I wasn’t really imagining this circumstance where we face the exact same bet over and over. Rather, I was imagining I face lots of different situations. So long as my probabilities are calibrated, the long-run frequency argument works out the same way. Kelly looks optimal. So what’s your beef with me going “full Kelly” on those estimates?

OTHER: In those terms, I’m examining the case where probabilities aren’t calibrated.

BAYESIAN: That’s not so hard to fix, though. I can make a calibration graph of my long-term performance. I can try to adjust my probability estimates based on that. If my 70% probability events tend to come back true 60% of the time, I adjust for that in the future. I’ve done this. You’ve done this.

OTHER: Do you really think your estimates are calibrated, now?

BAYESIAN: Not precisely, but I could put more work into it if I wanted to. Is this your crux? Would you be happy for me to go Full Kelly if I could show you a perfect x=y line on my calibration graph? Are you saying you can calculate the value for my fractional Kelly strategy from my calibration graph?

OTHER: … maybe? I’d have to think about how to do the calculation. But look, even if you’re perfectly calibrated in terms of past data, you might be caught off guard by a sudden change in the state of affairs.

BAYESIAN: Hm. So let’s grant that there’s uncertainty in my calibration graph. Are you saying it’s not my current point-estimate of my calibration that matters, but rather, my uncertainty about my calibration?

OTHER: I fear we’re getting overly meta. I do think should be lower the more uncertain you are about your calibration you are, in addition to lower the lower your point-estimate calibration is. But let’s get a bit more concrete. Look at the graph. I’m showing that you can expect better returns with lower in this scenario. Is that not compelling?

BAYESIAN (who at this point regresses to just being Abram again): See, that’s my problem. I don’t understand the graph. I’m kind of stuck thinking that it represents someone with their hands tied behind their back, like they can’t perform a Bayes update to improve their estimate , or they can’t change their after the start, or something.

• Just to be pedantic, I wanted to mention: if we take Fractional Kelly as the average-with-market-beliefs thing, it’s actually full Kelly in terms of our final probability estimate, having updated on the market :)

Yes—I absolutely should have made that clearer.

Concerning your first argument, that uncertainty leads to fractional Kelly—is the idea:

1. We have a probability estimate , which comes from estimating the true frequency ,

2. Our uncertainty follows a Beta distribution,

3. We have to commit to a fractional Kelly strategy based on our and never update that strategy ever again

Sort of? 1. Yes, 2. no, 3. kinda.

I don’t think it’s an argument which leads to fractional Kelly. It’s an argument which leads to “less than Kelly with a fraction which varies with your uncertainty”. This (to be clear) is not fractional Kelly, where I think we’re talking about a situation where the fraction is constant.

The chart I presented (copied from the Baker-McHale paper) does assume a beta distribution, and the “rule-of-thumb” which comes from that paper also assumes a beta distribution. The result that “uncertainty ⇒ go sub-Kelly” is robust to different models of uncertainty.

The first argument doesn’t really make a case for fractional Kelly. It makes a case for two things:

• Strong case: you should (unless you have really skewed uncertainty) be betting sub-Kelly

• Rule-of-thumb: you can approximate how much sub-Kelly you should go using this formula. (Which isn’t a fixed

So the graph shows what happens if we take our uncertainty and keep it as-is, not updating on data, as we continue to update?

Yes. Think of it as having a series of bets on different events with the same uncertainty each time.

Also, I don’t understand the graph. (The third graph in your post.) You say that it shows growth rate vs Kelly fraction. Yet it’s labeled “expected utility”. I don’t know what “expected utility” means, since the expected utility should grow unboundedly as we increase the number of iterations.

Or maybe the graph is of a single step of Kelly investment, showing expected log returns? But then wouldn’t Kelly be optimal, given that Kelly maximizes log-wealth in expectation, and in this scenario the estimate is going to be right on average, when we sample from the prior?

Yeah—the latter—I will edit this to make it clearer. This is “expected utility” for one-period. (Which is equivalent to growth rate). I just took the chart from their paper and didn’t want to edit it. (Although that would have made things clearer. I think I’ll just generate the graph myself).

Looking at the bit I’ve emphasised. No! This is the point. When is too large, this error costs you more than when it’s too small.

I think our confusion is coming from the fact we’re thinking about two different scenarios:

Here I am considering (notice the Kelly fraction depending on inside the utility but not outside). “What is my expected utility, if I bet according to Kelly given my estimate”. (Ans: Not Full Kelly)

I think you are talking about the scenario ? (Ans: Full Kelly)

I’m struggling to extract the right quotes from your dialogue, although I think there are several things where I don’t think I’ve managed to get my message across:

OTHER: In those terms, I’m examining the case where probabilities aren’t calibrated.

I’m trying to find the right Bayesian way to express this, without saying the word “True probability”. Consider a scenario where we’re predicting a lot of (different) sports events. We could both be perfectly calibrated (what you say happens 20% of the time happens 20% of the time) etc, but I could be more “uncertain” with my predictions. If my prediction is always 50-50 I am calibrated, but I really shouldn’t be betting. This is about adjusting your strategy for this uncertainty.

OTHER: So all I’m trying to do is examine the same game. But this time, rather than assuming we know the frequency of success from the beginning, I’m assuming we’re uncertain about that frequency.

BAYESIAN: Right… look, when I accepted the original Kelly argument, I wasn’t really imagining this circumstance where we face the exact same bet over and over. Rather, I was imagining I face lots of different situations. So long as my probabilities are calibrated, the long-run frequency argument works out the same way. Kelly looks optimal. So what’s your beef with me going “full Kelly” on those estimates?

No, my view were always closer to BAYESIAN here. I think we’re looking at a variety of different bets but where my probabilities are calibrated but uncertain. Being calibrated isn’t the same as being right. I have always assumed here that you are calibrated.

BAYESIAN: Not precisely, but I could put more work into it if I wanted to. Is this your crux? Would you be happy for me to go Full Kelly if I could show you a perfect x=y line on my calibration graph? Are you saying you can calculate the value for my fractional Kelly strategy from my calibration graph?

OTHER: … maybe? I’d have to think about how to do the calculation. But look, even if you’re perfectly calibrated in terms of past data, you might be caught off guard by a sudden change in the state of affairs.

No, definitely not. Your calibration graph really isn’t relevant to me here.

BAYESIAN (who at this point regresses to just being Abram again): See, that’s my problem. I don’t understand the graph. I’m kind of stuck thinking that it represents someone with their hands tied behind their back, like they can’t perform a Bayes update to improve their estimate , or they can’t change their after the start, or something.

This is almost certainly “on me”. I really don’t think I’m talking about a person who can’t update their estimate and I advocate people adjusting their fraction. I think there’s something which I’ve not made clear but I’m not 100% I know we’ve found what it is yet.

The strawman of your argument (which I’m struggling to understand where you differ) is. “A Bayesian with log-utility is repeatedly offered bets (mechanism for choosing bets unclear) against an unfair coin. His prior is that the coin comes up heads is uniform [0,1]. He should bet Full Kelly with p = 12 (or slightly less than Full Kelly once he’s updated for the odds he’s offered)”. I don’t think he should take any bets. (I’m guessing you would say that he would update his strategy each time to the point where he no longer takes any bets—but what would he do the first time? Would he take the bet?)

• This (to be clear) is not fractional Kelly, where I think we’re talking about a situation where the fraction is constant.

In the same way that “the Kelly strategy” in practice refers to betting a variable fraction of your wealth (even if the simple scenarios used to illustrate/​derive the formula involve the same bet repeatedly, so the Kelly strategy is one which implies betting a fixed fraction of wealth), I think it’s perfectly sensible to use “fractional Kelly” to describe a strategy which takes a variable fraction of the Kelly bet, using some formula to determine the fraction (even if the argument we use to establish the formula is one where a constant Kelly fraction is optimal).

What I would take issue with would be an argument for fractional Kelly which assumed we should use a constant Kelly fraction (as I said, “tying the agent’s hands” by only looking at strategies where some constant Kelly fraction is chosen). Because then it’s not clear whether some fractional-Kelly is the best strategy for the described scenario; it’s only clear that you’ve found some formula for which fractional-Kelly is best in a scenario, given that you’re using some fractional Kelly.

Which was one of my concerns about what might be going on with the first argument.

The result that “uncertainty ⇒ go sub-Kelly” is robust to different models of uncertainty.

I find myself really wishing that you’d use slightly more Bayesian terminology. Kelly betting is already a rule for betting under uncertainty. You’re specifically saying that meta-uncertainty implies sub-kelly. (Or parameter uncertainty, or whatever you want to call it.)

I’m trying to find the right Bayesian way to express this, without saying the word “True probability”.

I appreciate the effort :)

So the graph shows what happens if we take our uncertainty and keep it as-is, not updating on data, as we continue to update?

Yes. Think of it as having a series of bets on different events with the same uncertainty each time.

Right… so in this case, it pretty strongly seems to me like the usual argument for Kelly applies. If you have a series of different bets in which you have the same meta-uncertainty, either your meta-uncertainty is calibrated, in which case your probability estimates will be calibrated, so the Kelly argument works as usual, or your meta-uncertainty is uncalibrated, in which case I just go meta on my earlier objections: why aren’t we updating our meta-uncertainty? I’m fine with assuming repeated different bets (from different reference classes) with the same parameter uncertainty being applied to all of them so long as it’s apparently sensible to apply the same meta-uncertainty to all of them. But systematic errors in your parameter uncertainty (such that you can look at a calibration graph and see the problem) should trigger an update in the general priors you’re using.

Here I am considering ∫ (notice the Kelly fraction depending on inside the utility but not outside). “What is my expected utility, if I bet according to Kelly given my estimate”. (Ans: Not Full Kelly)

I think you are talking about the scenario ∫? (Ans: Full Kelly)

(Sorry, had trouble copying the formulae on greaterwrong)

I think what you’re pointing to here is very much like the difference between unbiased estimators and bayes-optimal estimators, right? Frequentists argue that unbiased estimators are better, because given any value of the true parameter, an unbiased estimator is in some sense doing a better job of approximating the right answer. Bayesians argue that Bayesian estimators are better, because of the bias-variance trade-off, and because you expect the Bayesian estimator to be more accurate in expectation (the whole point of accounting for the prior is to be more accurate in more typical situations).

I think the Bayesians pretty decisively win that particular argument; as an agent with a subjective perspective, you’re better off doing what’s best from within that subjective perspective. The Frequentist concept is optimizing based on a God’s-eye view, where we already know . In this case, it leads us astray. The God’s-eye view just isn’t the perspective from which a situated agent should optimize.

Similarly, I think it’s just not right to optimize the formula you give, rather than the one you attribute to me. If I have parameter uncertainty, then my notion of the expected value of using fractional Kelly is going to come from sampling from my parameter uncertainty, and checking what the expected payoffs are for each sample.

But then, as you know, that would just select a Kelly fraction of 1.

So if that formula describes your reasoning, I think you really are making the “true probability” mistake, and that’s why you’re struggling to put it in terms that are less objectionable from the Bayesian perspective. (Which, again, I don’t think is always right, but which I think is right in this case.)

(FYI, I’m not really arguing against fractional Kelly; full Kelly really does seem too high in some sense. I just don’t think this particular argument for fractional Kelly makes sense.)

Consider a scenario where we’re predicting a lot of (different) sports events. We could both be perfectly calibrated (what you say happens 20% of the time happens 20% of the time) etc, but I could be more “uncertain” with my predictions. If my prediction is always 50-50 I am calibrated, but I really shouldn’t be betting. This is about adjusting your strategy for this uncertainty.

I think what’s going on in this example is that you’re setting it up so that I know strictly more about sports than you. You aren’t willing to bet, because anything you know about the situation, I know better. In terms of your post, this is your second argument in favor of Kelly. And I think it’s the explanation here. I don’t think your meta-uncertainty has much to do with it.

Particularly if, as you posit, you’re quite confident that 50-50 is calibrated. You have no parameter uncertainty: your model is that of a fair coin, and you’re confident it’s the best model in the coin-flip model class.

BAYESIAN: Right… look, when I accepted the original Kelly argument, I wasn’t really imagining this circumstance where we face the exact same bet over and over. Rather, I was imagining I face lots of different situations. So long as my probabilities are calibrated, the long-run frequency argument works out the same way. Kelly looks optimal. So what’s your beef with me going “full Kelly” on those estimates?

No, my view were always closer to BAYESIAN here. I think we’re looking at a variety of different bets but where my probabilities are calibrated but uncertain. Being calibrated isn’t the same as being right. I have always assumed here that you are calibrated.

Then you concede the major assumption of BAYESIAN’s argument here! Under the calibration assumption, we can show that the long-run performance of Kelly is optimal (in the peculiar sense of optimality usually applied to Kelly, that is).

I’m curious how you would try and apply something like your formula to the mixed-bet case (ie, a case where you don’t have the same meta-uncertainty each time).

The strawman of your argument (which I’m struggling to understand where you differ) is. “A Bayesian with log-utility is repeatedly offered bets (mechanism for choosing bets unclear) against an unfair coin. His prior is that the coin comes up heads is uniform [0,1]. He should bet Full Kelly with p = 1⁄2 (or slightly less than Full Kelly once he’s updated for the odds he’s offered)”. I don’t think he should take any bets. (I’m guessing you would say that he would update his strategy each time to the point where he no longer takes any bets—but what would he do the first time? Would he take the bet?)

Here’s how I would fix this strawman. Note that the fixed strawman is still straw in the sense that I’m not actually arguing for full Kelly, I’m just trying to figure out your argument against it.

“A Bayesian with log-utility is repeatedly offered bets (coming from a rich, complex environment which I’m making no assumptions about, not even computability). His probabilities are, however, calibrated. Then full Kelly will be optimal.”

Probably there are a few different ways to mathify what I mean by “optimal” in this argument. Here are some observations/​conjectures:

• Full Kelly optimizes the expected utility of this agent, obviously. So if the agent really has log utility, and really is a Bayesian, clearly it’ll go full Kelly.

• After enough bets, since we’re calibrated, we can assume that the frequency of success for bets will closely match . So we can make the usual argument that full Kelly will be very close to optimal: ** Fractional Kelly, or other modified Kelly formulas, will make less money. ** In general, any other strategy will make less money in the long run, under the assumption that long-run frequencies match probabilities—so long as that strategy does not contain further information about the world.

(For example, in your example where you have an ignorant but calibrated 50-50 model, maybe the true world is “yes on even-numbered dates, no on odd”. A strategy based on this even-odd info could outperform full Kelly, obviously. The claim is that so long as you’re not doing something like that, full Kelly will be approximately best.)

I think there’s something which I’ve not made clear but I’m not 100% I know we’ve found what it is yet.

My current estimate is that this is 100% about the frequentist Gods-eye-view way of arguing, where you evaluate the optimality of something by supposing a “true probability” and thinking about how well different strategies do as a function of that.

If so, I’ll be curious to hear your defense of the gods-eye perspective in this case.

One thing I want to make clear is that I think there’s something wrong with your argument on consequentialist grounds.

Or maybe the graph is of a single step of Kelly investment, showing expected log returns? But then wouldn’t Kelly be optimal, given that Kelly maximizes log-wealth in expectation, and in this scenario the estimate is going to be right on average, when we sample from the prior?

Yeah—the latter—I will edit this to make it clearer. This is “expected utility” for one-period. (Which is equivalent to growth rate). I just took the chart from their paper and didn’t want to edit it. (Although that would have made things clearer. I think I’ll just generate the graph myself).

Looking at the bit I’ve emphasised. No! This is the point.

I want to emphasize that I also think there’s something consequentialistly weird about your position. As non-Bayesian as some arguments for Kelly are, we can fit the Kelly criterion with Bayes, by supposing logarithmic utility. So a consequentialist can see those arguments as just indirect ways of arguing for logarithmic utility.

Not so with your argument here. If we asses a gamble as having probability , then what could our model uncertainty have to do with anything? Model uncertainty can decrease our confidence that expected events will happen, but already prices that in. Model uncertainty also changes how we’ll reason later, since we’ll update on the results here (and wouldn’t otherwise do so). But, that doesn’t matter until later.

We’re saying: “Event might happen, with probability ; event might happen, with probability .” Our model uncertainty grants more nuance to this model by allowing us to update it on receiving more information; but in the absence of such an update, it cannot possibly be relevant to the consequences of our strategies in events and . Unless there’s some funny updateless stuff going on, which you’re clearly not supposing.

From a consequentialist perspective, then, it seems we’re forced to evaluate the expected utility in the same way whether we have meta-uncertainty or not.