For sure—both my titles were clickbait compared to what I was saying.
I think if I was trying to explain Kelly, I would definitely talk in terms of time-averaging and maximising returns. I (hope) I wouldn’t do this as an “argument for” Kelly. I think if I was to make an argument for Kelly which is trying to persuade people it would be something close to my post. (Whereby I would say “Here are a bunch of nice properties Kelly has + it’s simple + there are easy modifications if it seems too aggressive” and try to gauge from their reactions what I need to talk about).
I will definitely be more careful about how I phrase this stuff though. I think if I wrote both posts again I would think harder about which bits were an “argument” and which bits were guides for intuition.
I actually wouldn’t make very much of a defence for the Peters stuff. I (personally) put little stock in it. (At least, I haven’t found the “Aha!” moment where what they seem to be selling clicks for me).
I think the most interesting thing about Kelly (which has definitely come through over our posts) is that Kelly is a very useful lens into preferences and utilities. (Regardless of which perspective you come from).
Thanks for writing this! I feel like we’re now much closer to each other in terms of what we actually think. I roughly suspect we agree:
Kelly is a litmus test for utilities
For a Bayesian with log-utility Kelly is the end of the story
You think the important bit is the utility, I think the important bit is what it says about people’s utilities.
The simplest product (at least from an understanding point of view) would be VIX futures. These are futures which are (to a first approximation) cash settled to the VIX Index. (You can view the specs here).
One thing to notice is that they expire. This means that if you buy a future to gain exposure, when it expires you lose your exposure. (The same is true options—when they expire, you lose your optionality. (Actually, you lose some optionality on a daily basis which is par of why you can’t own / replicate the VIX Index)). This means you have to come up with a strategy to “roll” your exposure before it expires. You can have a look at the term structure of VIX futures here.
Another thing to notice is that VIX futures are the expected value of the index—NOT the index. Typically when vol explodes, the VIX Index goes very high, the front future goes high, the next future less high and so on… Depending on which futures you own, you will make money, but not as much as the index will have moved.
Typically retail investors tend to trade the VIX via ETFs. These tend to formalise a strategy of buying and rolling VIX futures. Generally you can find the details in the ETF docs.
The VIX isn’t tradeable.
There are futures which are based off of the VIX. And there are ETFs which own have portfolios of those futures. These products are very different from “buying” the VIX and I would being very careful when “trading” or “investing” in these products. There are lots of products in this space, and they won’t necessarily behave like you think they will.
Just to be pedantic, I wanted to mention: if we take Fractional Kelly as the average-with-market-beliefs thing, it’s actually full Kelly in terms of our final probability estimate, having updated on the market :)
Yes—I absolutely should have made that clearer.
Concerning your first argument, that uncertainty leads to fractional Kelly—is the idea: We have a probability estimate ^p, which comes from estimating the true frequency p,Our uncertainty follows a Beta distribution,We have to commit to a fractional Kelly strategy based on our ^p and never update that strategy ever again
Concerning your first argument, that uncertainty leads to fractional Kelly—is the idea:
We have a probability estimate ^p, which comes from estimating the true frequency p,
Our uncertainty follows a Beta distribution,
We have to commit to a fractional Kelly strategy based on our ^p and never update that strategy ever again
Sort of? 1. Yes, 2. no, 3. kinda.
I don’t think it’s an argument which leads to fractional Kelly. It’s an argument which leads to “less than Kelly with a fraction which varies with your uncertainty”. This (to be clear) is not fractional Kelly, where I think we’re talking about a situation where the fraction is constant.
The chart I presented (copied from the Baker-McHale paper) does assume a beta distribution, and the “rule-of-thumb” which comes from that paper also assumes a beta distribution. The result that “uncertainty ⇒ go sub-Kelly” is robust to different models of uncertainty.
The first argument doesn’t really make a case for fractional Kelly. It makes a case for two things:
Strong case: you should (unless you have really skewed uncertainty) be betting sub-Kelly
Rule-of-thumb: you can approximate how much sub-Kelly you should go using this formula. (Which isn’t a fixed
So the graph shows what happens if we take our uncertainty and keep it as-is, not updating on data, as we continue to update?
Yes. Think of it as having a series of bets on different events with the same uncertainty each time.
Also, I don’t understand the graph. (The third graph in your post.) You say that it shows growth rate vs Kelly fraction. Yet it’s labeled “expected utility”. I don’t know what “expected utility” means, since the expected utility should grow unboundedly as we increase the number of iterations.Or maybe the graph is of a single step of Kelly investment, showing expected log returns? But then wouldn’t Kelly be optimal, given that Kelly maximizes log-wealth in expectation, and in this scenario the estimate ^p is going to be right on average, when we sample from the prior?
Also, I don’t understand the graph. (The third graph in your post.) You say that it shows growth rate vs Kelly fraction. Yet it’s labeled “expected utility”. I don’t know what “expected utility” means, since the expected utility should grow unboundedly as we increase the number of iterations.
Or maybe the graph is of a single step of Kelly investment, showing expected log returns? But then wouldn’t Kelly be optimal, given that Kelly maximizes log-wealth in expectation, and in this scenario the estimate ^p is going to be right on average, when we sample from the prior?
Yeah—the latter—I will edit this to make it clearer. This is “expected utility” for one-period. (Which is equivalent to growth rate). I just took the chart from their paper and didn’t want to edit it. (Although that would have made things clearer. I think I’ll just generate the graph myself).
Looking at the bit I’ve emphasised. No! This is the point. When ^p is too large, this error costs you more than when it’s too small.
I think our confusion is coming from the fact we’re thinking about two different scenarios:
Here I am considering ∫^p(pu(1+bkf(^p))+(1−p)u(1−kf(^p)))f(^p)d^p (notice the Kelly fraction depending on ^p inside the utility but not outside). “What is my expected utility, if I bet according to Kelly given my estimate”. (Ans: Not Full Kelly)
I think you are talking about the scenario ∫^p(^pu(1+bkf(^p))+(1−^p)u(1−kf(^p)))f(^p)d^p? (Ans: Full Kelly)
I’m struggling to extract the right quotes from your dialogue, although I think there are several things where I don’t think I’ve managed to get my message across:
OTHER: In those terms, I’m examining the case where probabilities aren’t calibrated.
I’m trying to find the right Bayesian way to express this, without saying the word “True probability”. Consider a scenario where we’re predicting a lot of (different) sports events. We could both be perfectly calibrated (what you say happens 20% of the time happens 20% of the time) etc, but I could be more “uncertain” with my predictions. If my prediction is always 50-50 I am calibrated, but I really shouldn’t be betting. This is about adjusting your strategy for this uncertainty.
OTHER: So all I’m trying to do is examine the same game. But this time, rather than assuming we know the frequency of success from the beginning, I’m assuming we’re uncertain about that frequency.BAYESIAN: Right… look, when I accepted the original Kelly argument, I wasn’t really imagining this circumstance where we face the exact same bet over and over. Rather, I was imagining I face lots of different situations. So long as my probabilities are calibrated, the long-run frequency argument works out the same way. Kelly looks optimal. So what’s your beef with me going “full Kelly” on those estimates?
OTHER: So all I’m trying to do is examine the same game. But this time, rather than assuming we know the frequency of success from the beginning, I’m assuming we’re uncertain about that frequency.
BAYESIAN: Right… look, when I accepted the original Kelly argument, I wasn’t really imagining this circumstance where we face the exact same bet over and over. Rather, I was imagining I face lots of different situations. So long as my probabilities are calibrated, the long-run frequency argument works out the same way. Kelly looks optimal. So what’s your beef with me going “full Kelly” on those estimates?
No, my view were always closer to BAYESIAN here. I think we’re looking at a variety of different bets but where my probabilities are calibrated but uncertain. Being calibrated isn’t the same as being right. I have always assumed here that you are calibrated.
BAYESIAN: Not precisely, but I could put more work into it if I wanted to. Is this your crux? Would you be happy for me to go Full Kelly if I could show you a perfect x=y line on my calibration graph? Are you saying you can calculate the α value for my fractional Kelly strategy from my calibration graph?OTHER: … maybe? I’d have to think about how to do the calculation. But look, even if you’re perfectly calibrated in terms of past data, you might be caught off guard by a sudden change in the state of affairs.
BAYESIAN: Not precisely, but I could put more work into it if I wanted to. Is this your crux? Would you be happy for me to go Full Kelly if I could show you a perfect x=y line on my calibration graph? Are you saying you can calculate the α value for my fractional Kelly strategy from my calibration graph?
OTHER: … maybe? I’d have to think about how to do the calculation. But look, even if you’re perfectly calibrated in terms of past data, you might be caught off guard by a sudden change in the state of affairs.
No, definitely not. Your calibration graph really isn’t relevant to me here.
BAYESIAN (who at this point regresses to just being Abram again): See, that’s my problem. I don’t understand the graph. I’m kind of stuck thinking that it represents someone with their hands tied behind their back, like they can’t perform a Bayes update to improve their estimate ^p, or they can’t change their α after the start, or something.
This is almost certainly “on me”. I really don’t think I’m talking about a person who can’t update their estimate and I advocate people adjusting their fraction. I think there’s something which I’ve not made clear but I’m not 100% I know we’ve found what it is yet.
The strawman of your argument (which I’m struggling to understand where you differ) is. “A Bayesian with log-utility is repeatedly offered bets (mechanism for choosing bets unclear) against an unfair coin. His prior is that the coin comes up heads is uniform [0,1]. He should bet Full Kelly with p = 1⁄2 (or slightly less than Full Kelly once he’s updated for the odds he’s offered)”. I don’t think he should take any bets. (I’m guessing you would say that he would update his strategy each time to the point where he no longer takes any bets—but what would he do the first time? Would he take the bet?)
I linked several papers, is there one in particular you are referring to and a section I could make clearer?
Roughly speaking, it’s about “when” you take square roots and what that means for the product you are trading. Here is a handy guide on a zoo of vol/var swap/forward/future products.
The key thing is less about what “volatility” and “variance” have been. (Realized volatility is the square-root of realised variance). We’re talking about the expectation for the next month’s volatility or variance.
The “mathematician” way to think about this (although I think this is a little unhelpful) is E(√X)≤√E(X). If “X” is (future) realised variance (as yet unknown), then the former is “volatility” and the latter is “square root of variance” (what I call “variance in vol units”). Therefore “expected volatility” is lower than “square root expected variance”. The difference is what needs compensating
The more practical way to think about this, is that variance is being dominated much more by the tails (or volatility of volatility). When you trade a variance, you need a premium over volatility to compensate you for these tails (even if they don’t realise very often).
Another way to think about this, is there is “convexity” in variance (when measured in units of volatility). If you are long and volatility goes up, you much more (because it’s squared), but if it goes down, you aren’t making as much less.
What unit of information does the VIX track? the volatility of the S&P 500 index over the next 30 days, annualized. What does this mean?
VIX tracks the variance not volatility of the S&P. (Slightly more subtly, it measures the variance in vol units). (This twitter thread does a decent job of explaining the difference and why it matters)
This was fascinating. Thanks for taking the time to write it. I agree with the vast majority of what you wrote, although I don’t think it actually applies to what I was trying to do in this post. I don’t disagree that a full-Bayesian finds this whole thing a bit trivial, but I don’t believe people are fully Bayesian (to the extent they know their utility function) and therefore I think coming up with heuristics is valuable to help them think about things.
So, similarly, I see the Peters justification of Kelly as ultimately just a fancy way of saying that taking the logarithm makes the math nice. You’re leaning on that argument to a large extent, although you also cite some other properties which I have no beef with.
I don’t really think of it as much as an “argument”. I’m not trying to “prove” Kelly criterion. I’m trying to help people get some intuition for where it might come from and some other reasons to consider it if they aren’t utility maximising.
It’s interesting to me that you brought up the exponential St Petersburg paradox, since MacLean, Thorpe, Ziemba claim that Kelly criterion can also handle it although I personally haven’t gone through the math.
Yeah, I think I’m about to write a reply to your massive comment, but I think I’m getting closer to understanding. I think what I really need to do is write my “Kelly is Black-Scholes for utility” post.
I think that (roughly) this post isn’t aimed at someone who has already decided what their utility is. Most of the examples you didn’t like / saw as non-sequitor were explicitly given to help people think about their utility.
Yes—I cited Peters in the post (and stole one of their images). Personally I don’t actually think what they are doing has as much value as they seem to think, although that’s a whole other conversation. I basically think something akin to your third bullet point.
Having read your comments on the other post, I think I understand your critique, and I don’t think there’s much more to be said if you take the utility as axiomatic. However, I guess the larger point I’m trying to make is there are other reasons to care about Kelly other than if you’re a log-utility maximiser. (Several of which you mention in your post)
Yeah—I agree, that was what I was trying to get at. I tried to address (the narrower point) here:
Compounding is multiplicative, so it becomes “natural” (in some sense) to transform everything by taking logs.
But I agree giving some examples of where it doesn’t apply would probably have been helpful to demonstrate when it is useful
Thanks! That’s helpful. I definitely wrote this rather stream of consciousness and I definitely was more amped up about what I was going to say at the start than I was by the time I’d gotten halfway through. EDIT: I’ve changed the title an added a note at the topI think the section where I say “it doesn’t matter how you think about this” I mean it something in the sense of: “Prices and vols are equivalent in a Black-Scholes world, it doesn’t matter if you think in terms of prices of vols, but thinking in terms of vols is usually much more helpful”.
I also agree that having a handy version of the formula is useful. I basically think of Kelly in the format you do in your comment I highlighted and I think I would never have written this if someone else hadn’t taken that comment butchered it a little but and it became a (somewhat) popular post. (Roughly I started writing a long fairly negative comment on that post, and tried to turn it into something more positive. I see I didn’t quite manage to avoid all the anger issues that entails).
I’m not sure what prompted all of this effort,
The comments section here and the post and comments section here. To be completely frank, my post started out as a comment similar to yours in those threads. “I’m not sure what led you to post this”. (Especially the Calculating Kelly post which seemed to mostly copy and make worse this comment).
I’ve rarely heard Kelly described as corresponding to log utility,
I actually agree with you that aside from LW I haven’t really seen Kelly discussed in the context of log-utilities, which is why I wanted to address this here rather than anywhere else.
only ever as an aside about mean-variance optimization
Okay, here our experiences differ. I see Kelly coming up in all sorts of contexts, not just relating to mean-variance portfolio optimization for a CRRA-utility or whatever.
If anything, I’d say that the Kelly—log utility connection obviously suggests one point, which is that most people are far too risk-averse (less normatively, most people don’t have log utility functions). The exception is Buffett—empirically he does, subject to leverage constraints.
So I agree with this. I’d quite happily write the “you are too risk averse” post, but I think Putanumonit already did a better job than I could hope to do on that
A couple of reasons:
For whatever reason, people seem to really like Kelly criterion related posts at the moment.
I think Kelly is a good framework for thinking about things
“Kelly is about repeated bets” could easily be “Kelly is about bet sizing”
“Kelly is Black-Scholes for utility”
Kelly is optimal (in some very concrete senses) and fractional-Kelly is optional in some other senses which I think people don’t discuss enough
This is my first post, so I would appreciate any feedback. This started out as a comment on one of the other threads but kept on expanding from there.I’m also tempted to write “Contra Kelly Criterion” or “Kelly is just the start” where I write a rebuttal to using Kelly. (Rough sketch—Kelly is not enough you need to understand your edge, Kelly is too volatile). Or “Fractional Kelly is the true Kelly” (either a piece about how fractional Kelly accounts for your uncertainty vs market uncertainty OR a piece about how fractional Kelly is about power utilities OR a piece about fractional Kelly is optimal in some risk-return sense)
Intuitively, if I think something has a ~10% change of happening, I want at least a 10x bet. (Before even worrying about Kelly).
This also seems like a better way to intuit approximate answers. If I think an event has a 12% chance, and the potential payoff of a bet is to multiply my investment by 3.7, then I can’t immediately tell you what the Kelly bet is. However, I can immediately tell you that 1037 is less than halfway along the distance from 12% to 100%, or that it’s more than a tenth of the way. So I know the Kelly bet isn’t so much as half the bankroll, but it also isn’t so little as 10%.
1⁄3.7 ~ 27% >> 12% so you shouldn’t be betting anything