# Oscar_Cunningham

Karma: 6,367
• Yup, you want a bi-interpretation:

Two models or theories are mutually interpretable, when merely each is interpreted in the other, whereas bi-interpretation requires that the interpretations are invertible in a sense after iteration, so that if one should interpret one model or theory in the other and then re-interpret the first theory inside that, then the resulting model should be definably isomorphic to the original universe

Bi-interpretation in weak set theories

• I don’t think that’s the only reason—if I value something linearly, I still don’t want to play a game that almost certainly bankrupts me.

I still think that’s because you intuitively know that bankruptcy is worse-than-linearly bad for you. If your utility function were truly linear then it’s true by definition that you would trade an arbitrary chance of going bankrupt for a tiny chance of a sufficiently large reward.

I mean, that’s not obvious—the Kelly criterion gives you, in the example with the game, E(money) = \$240, compared to \$246.61 with the optimal strategy. That’s really close.

Yes, but the game is very easy, so a lot of different strategies get you close to the cap.

• It bankrupts you with probability 1 − 0.6^300, but in the other 0.6^300 of cases you get a sweet sweet \$25 × 2^300. This nets you an expected \$1.42 × 10^25.

Whereas Kelly betting only has an expected value of \$25 × (0.6×1.2 + 0.4×0.8)^300 = \$3220637.15.

Obviously humans don’t have linear utility functions, but my point is that the Kelly criterion still isn’t the right answer when you make the assumptions more realistic. You actually have to do the calculation with the actual utility function.

• The answer is that you bet approximately Kelly.

No, it isn’t. Gwern never says that anywhere, and it’s not true. This is a good example of what I’m saying.

For clarity the game is this. You start with \$25 and you can bet any multiple of \$0.01 up to the amount you have. A coin is flipped with a 6040 bias in your favour. If you win you double the amount you bet, otherwise you lose it. There is a cap of \$250, so after each bet you lose any money over this amount (so in fact you should never make a bet that could take you over). This continues for 300 rounds.

Bob’s edge is 20%, so the Kelly criterion would recommend that he bets \$5. If he continues to use the Kelly criterion in every round (except if this would take him over the cap, in which case he bets to take him to the cap) he ends with an average of \$238.04.

As explained on the page you link to, the optimal strategy and expected value can be calculated inductively based on the number of bets remaining. The optimal starting bet is \$1.99, and if you continue to bet optimally your average amount of money is \$246.61.

So in this game the optimal starting bet is only 20% of the Kelly bet. The Kelly strategy bets too riskily, and leaves \$8.57 on the table compared to the optimal strategy.

Kelly isn’t optimal in any limit either. As the number of rounds goes to infinity, the optimal strategy is to bet just \$0.01, since this maximises the likelihood of never going bankrupt. If instead the cap goes to infinity then the optimal strategy is to bet everything on every round. Of course you could tune the cap and the number of rounds together so that Kelly was optimal on the first bet, but then it still wouldn’t be optimal for subsequent bets.

(EDIT: It’s actually not certain that the optimal strategy in the first round is \$1.99, since floating point accuracy in the computations becomes relevant and many starting bets give the same result. But \$5 is so far from optimum that it genuinely did give a lower expected value, so we can say for certain that Kelly is not optimal.)

• If Bob wants to maximise his money at the end, then he really should bet it all every round. I don’t see why you would want to use Kelly rather than maximising expected utility. Not maximising expected utility means that you expect to get less utility.

• Can you be more precise about the exact situation Bob is in? How many rounds will he get to play? Is he trying to maximise money, or trying to beat Alice? I doubt the Kelly criterion will actually be his optimal strategy.

• I tend to view the golden ratio as the least irrational irrational number. It fills in the next gap after all the rational numbers. In the same way, 12 is the noninteger which shares the most algebraic properties with the integers, even though it’s furthest from them in a metric sense.

• Nice idea! We can show directly that each term provides information about the next.

The density function of the distribution of the fractional part in the continued fractional algorithm converges to 1/​[(1+x) ln(2)] (it seems this is also called the Gauss-Kuzmin distribution, since the two are so closely associated). So we can directly calculate the probability of getting a coefficient of n by integrating this from 1/​(n+1) to 1/​n, which gives -lg(1-1/​(n+1)^2) as you say above. But we can also calculate the probability of getting an n followed by an m, by integrating this from 1/​(n+1/​m) to 1/​(n+1/​(m+1)), which gives -lg(1-1/​(mn+1)(mn+m+n+2)). So dividing one by the other gives P(m|n) = lg(1-1/​(mn+1)(mn+m+n+2))/​lg(1-1/​(n+1)^2), which is rather ugly, but the point is that it does depend on n.

This turns out to be an anticorrelation. High numbers are more likely to by followed by low numbers, and vice-versa. The probability of getting a 1 given you’ve just had a 1 is 36.6%, whereas if you’ve just had a very high number the probability of getting a 1 will be very close to 50% (since the distribution of the fractional part is tending to uniform).

• I’ve only been on Mastodon a bit longer than the current Twitter immigrants, but as far as I know there’s no norm against linking. But the server admins are all a bit stressed by the increased load. So I can understand why they’d be annoyed by any link that brought new users. I’ve been holding off on inviting new users to the instance I’m on, because the server is only just coping as it is.

• Apologies for asking an object level question, but I probably have Covid and I’m in the UK which is about to experience a nasty heatwave. Do we have a Covid survival guide somewhere?

(EDIT: I lived lol)

• Is there a way to alter the structure of a futarchy to make it follow a decision theory other than EDT?

• Or is that still too closely tied to the explore-exploit paradigm?

Right. The setup for my problem is the same as the ‘bernoulli bandit’, but I only care about the information and not the reward. All I see on that page is about exploration-exploitation.

• What’s the term for statistical problems that are like exploration-exploitation, but without the exploitation? I tried searching for ‘exploration’ but that wasn’t it.

In particular, suppose I have a bunch of machines which each succeed or fail independently with a probability that is fixed separately for each machine. And suppose I can pick machines to sample to see if they succeed or fail. How do I sample them if I want to become 99% certain that I’ve found the best machine, while using the fewest number of samples?

The difference with exploration-exploitation is that this is just a trial period, and I don’t care about how many successes I get during this testing. So I want something like Thompson sampling, but for my purposes Thompson sampling oversamples the machine it currently thinks is best because it values getting successes rather than ruling out the second-best options.

• The problem is that this measures their amount of knowledge about the questions as well as their calibration.

My model would be as follows. For a fixed source of questions, each person has a distribution describing how much they know about the questions. It describes how likely it is that a given question is one they should say p on. Each person also has a calibration function f, such that when they should say p they instead say f(p). Then by assigning priors over the spaces of these distributions and calibration functions, and applying Bayes’ rule we get a posterior describing what we know about that persons calibration function.

Then assign a score to each calibration function which is the expected log score lost by a person using that calibration function instead of an ideal one, assuming that the questions were uniformly distributed in difficulty for them. Then their final calibration score is just the expected value of that score given our distribution of calibration functions for them.