Thank you for sharing that Bayesian Beauty paper, its very clear.
I think the OP is onto something useful when they distinguish between (my phrasing) the approach such that “my probabilities are calibrated such that they represent the proportion of timelines in which I am right” and the approach that means “my probabilities are calibrated with respect to the number of times I answer questions about them”.
I take the “timelines” (first) option. The second one seems odd to me, for example say someone considers a multiple choice question, decides that answer (a) is correct. They are told they have to wait in the examination room while other candidates finish. During this 10 minute wait they spend 9 minutes confident (a) was the right choice, then with 1 minute to go, realise (b) is actually right and change their answer. Here I would only judge the correctness (or otherwise) of the final choice, (b), and it would not be relevant that they spent more time believing (a). Similarly, if Beauty is woken for a second time any guesses she makes about the coin on this second waking should be taken to replace any guesses she made on the first waking (if you take the timelines point of view). I think we should only mark “final answers”, otherwise we end up in really weird territory where if you think something is true with 2/3rds probability, but you used to think it was true with 1/3rd probability, then you should exaggerate your newfound belief that it is more likely in order to be more right on a temporal average.
Argument against marking only final answers: on Wednesday Beauty wakes up, and learns that the experiment has concluded. At this time her credence for P(heads) is 1⁄2. This is true both on a per-Wednesday-awakening basis and on a per-coin-flip basis, regardless of whether she is a thirder, halfer, or something else. If we only mark final answers, the question of the correct probability on Monday and Tuesday is left with no answer.
I agree that you can make a betting/scoring setup such that betting/predicting at 50% is correct. Eg, suppose that on both Monday and Tuesday, Beauty gets to make a $1 bet at some odds. If she’s asleep on Tuesday then whatever bet she made on Monday is repeated. In that case she should bet at 1:1 odds. With other scoring setups we can get different answers. Overall, this makes me think that this approach is misguided, unless we have a convincing argument about which scoring setup is correct.
I think the exam hypothetical creates misleading intuitions because written exams are generally graded based on the final result. Whereas if this was an oral examination in spoken language, a candidate who spent the first nine minutes with incorrect beliefs about grammatical gender would lose points. But certainly if I knew my life was being scored based on my probabilities at the moment of death, I would optimize for dying in a place of accurate certainty.
Yes, the Wednesday point is a good one, so it the oral exam comparison.
I think we agree that the details of the “scoring system” completely change the approach beauty should take. This is not true for most probability questions. Like, if she can bet 1$ at some odds each time she wakes up then it makes sense for her policy going in to more heavily weight the timeline in which she gets to bet twice. As you point out if her sleeping self repeats bets that changes things. If the Tuesday bet is considered to be “you can take the bet, but it will replace the one you may or may not have given on a previous day if their was one”, then things line up to half again. If she has to guess the correct outcome of the coin flip, or else die once the experiment is over, then the strategy where she always guesses heads is just as good as the one where she always guesses tails. Her possible submissions are [H, T, HH, TT], two of which result in death. Where we differ is that you think the details of the scoring system being relevant suggest the approach is misguided. In contrast I think the fact that scoring system details matter is the entire heart of the problem. If I take probabilities as “how I should bet” then the details of the bet should matter. If I take probabilities as frequencies then I need to decide whether the denominator is “per timeline” or “per guess”. I don’t think the situation allows one to avoid these choices, and (at least to me) once these are identified as choices, with neither option pushed upon us by probability theory, the whole situation appears to make sense.
Frequentist example—If you understood me above you certainly don’t need this example so skip ---: The experiment is repeated 100 times, with 50 heads and 50 tails on the coin flip. A total of 150 guesses are made by beauty. BeautyV1.0 said the probability was 50⁄50 every time. This means that for 100 of the times she answered 50⁄50 it was actually tails, and 50 times she answered this same way it was actually heads. So she is poorly calibrated with respect to the number of times she answered. By this scoring system BeautyV2.0 who says “1/3” appears better.
However, in every single trial, even the ones where beauty is consulted twice, she only provides one unique answer. If beauty is a simple program call in a simulation then she is a memoryless program, in some cases evaluated twice. In half the trials it was heads, in half it was tails, in every trial BeautyV1.0 said it was 50⁄50. So she is well calibrated per trial, but not per evaluation. BeautyV2.0 is the other way around.
the details of the “scoring system” completely change the approach beauty should take. This is not true for most probability questions.
I believe this IS true for most probability questions. However, most probability questions have only one reasonable scoring system to consider, so you don’t notice the question.
If the Tuesday bet is considered to be “you can take the bet, but it will replace the one you may or may not have given on a previous day if their was one”, then things line up to half again.
I can think of two interpretations of the setup you’re describing here, but for both interpretations, Beauty does the right thing only if she thinks Heads has probability 1⁄3, not1⁄2.
Note that depending on the context, a probability of 1⁄2 for something does not necessarily lead one to bet on it at 1:1 odds. For instance, if based on almost no knowledge of baseball, you were to assign probability 1⁄2 to the Yankees winning their next game, it would be imprudent to offer to bet anyone that they will win, at 1:1 odds. It’s likely that the only people to take you up on this bet are the ones who have better knowledge, that leads them to think that the Yankees will not win. But even after having decided not to offer this bet, you should still think the probability of the Yankees winning is 1⁄2, assuming you have not obtained any new information.
I’ll assume that Beauty is always given the option of betting on Heads at 1:1 odds. From an outside view, it is clear that whatever she does, her expected gain is zero, so we hope that that is also her conclusion when analyzing the situation after each awakening. Of course, by shifting the payoffs when betting on Heads a bit (eg, win $0.90, lose $1) we can make betting on Heads either clearly desirable or clearly undesirable, and we’d hope that Beauty correctly decides in such cases.
So… first interpretation: At each awakening, Beauty can bet on Heads at 1:1 odds (winning $1 or losing $1), or decide not to bet. If she decides to bet on Tuesday, any bet she may have made on Monday is cancelled, and if she decides not to bet on Tuesday, any bet she made on Monday is also cancelled. In other words, what she does on Monday is irrelevant if the coin landed Tails, since she will in that case be woken on Tuesday and whatever she decides then will replace whatever decision she made on Monday. (Of course, if the coin landed Heads, her decision on Monday does take effect, since she isn’t woken on Tuesday.)
Beauty’s computation on awakening of the expected gain from betting on Heads (winning $1 or losing $1) can be written as follows:
The 0 gain from betting on Monday when the coin landed Tails is because in that situation her decision is irrelevant. The expected gain from not betting is of course zero.
If Beauty thinks that P(Heads)=1/3, and hence P(Heads&Monday)=1/3, P(Tails&Monday)=1/3, and P(Tails&Tuesday)=1/3, and the formula above evaluates to zero, as we would hope. Furthermore, if the payoffs deviate from 1:1, Beauty will decide to bet or not in the correct way.
If Beauty instead thinks that P(Heads)=1/2, then P(Heads&Monday)=1/2. Trying to guess what a Halfer would think, I’ll also assume that she thinks that P(Tails&Monday)=1/4 and P(Tails&Tuesday)=1/4. (Those who deny that these probabilities should sum to one, or maintain that the concept of probability is inapplicable to Beauty’s situation, are beyond the bounds of rational discussion.) If we plug these into the formula above, we get that Beauty thinks her expected gain from betting on Heads is $0.25, which is contrary to the outside view that it is zero. Furthermore, if we shift the payoffs a little bit, so a win betting on Heads gives $0.90 and a loss betting on Heads still costs $1, then Beauty will compute the expected gain from betting on Heads to be $0.45 minus $0.25, which is $0.20$, so she’s still enthusiastic about betting on Heads. But the outside view is that with these payoffs the expected gain is minus $0.05, so betting on Heads is a losing proposition.
Now… second interpretation: At each awakening, Beauty can bet or decide not to bet. If she decides to bet on Tuesday, any bet she may have made on Monday is cancelled, but if she decides not to bet on Tuesday, a bet she made on Monday is still in effect. In other words, she makes a bet if she decides to bet on either Monday or Tuesday, but if she decides to bet on both Monday and Tuesday, it only counts as one bet.
This one is more subtle. I answered essentially the same question in a comment on this blog post, so I’ll just copy the relevant portion below, showing how a Thirder will do the right thing (with other than 1:1 payoffs). Note that “invest” means essentially the same as “bet”:
- - -
First, to find the optimal strategy in this situation where investing when Heads yields −60 for Beauty, and investing when Tails yields +40 for Beauty, but only once if Beauty decides to invest on both wakenings, we need to consider random strategies in which with probability q, Beauty does not invest, and with probability (1-q) she does invest.
Beauty’s expected gain if she follows such a strategy is (1/2)40(1-q^2)-(1/2)60(1-q). Taking the derivative, we get −40q+30, and solving for this being zero gives q=3/4 as the optimal value. So Beauty should invest with probability 1⁄4, which works out to giving her an average return of (1/2)40(7/16)-(1/2)60(1/4)=5/4.
That’s all when working out the strategy beforehand. But what if Beauty re-thinks her strategy when she is woken? Will she change her mind, making this strategy inconsistent?
Well, she will think that with probability 1⁄3, the coin landed Heads, and the investment will lose 60, and with probability 2⁄3, the coin landed Tails, in which case the investment will gain 40, but only if she didn’t/doesn’t invest in her other wakening. If she follows the strategy worked out above, in her other wakening, she doesn’t invest with probability 3⁄4, and if Beauty thinks things over with that assumption, she will see her expected gain as (2/3)(3/4)40-(1/3)60=0. With a zero expected gain, Beauty would seem indifferent to investing or not investing, and so has no reason to depart from the randomized strategy worked out beforehand. We might hope that this reasoning would lead to a positive recommendation to randomize, not just acquiescence to doing that, but in Bayesian decision theory randomization is never required, so if there’s a problem here it’s with Bayesian decision theory, not with the Sleeping Beauty problem in particular.
- - -
On the other hand, if Beauty thinks the probability of Heads is 1⁄2, then she will think that investing has expected return of (1/2)(-60) + (1/2)40 or less (the 40 assumes she didn’t/doesn’t invest on the other awakening, if she did/does it would be 0). Since this is negative, she will change her mind and not invest, which is a mistake.
“If Beauty instead thinks that P(Heads)=1/2, then P(Heads&Monday)=1/2. Trying to guess what a Halfer would think, I’ll also assume that she thinks that P(Tails&Monday)=1/4 and P(Tails&Tuesday)=1/4”
This part of the analysis gets to the core of my opinion. One could argue that if the coin is flicked and comes up tails then we have both “Tails&Monday” and “Tails&Tuesday” as both being correct, sequentially. They are not mutually exclusive outcomes. One could also argue that, on a particular waking up Beauty doesn’t know which day it is and thus they are exclusive in that moment.*
Which gets me back to my point. What do you mean by “probability”. If we take a frequentist picture do we divide by the number of timelines (or equivalently separate experimental runs if many are run), or do we divide by the number of times beauty wakes up. I think either can be defended and that you should avoid the words “probability” or the notation P( X ) until after you have specified which denominator you are taking.
I like the idea of trying to evaluate the optimal policy for Beauty, given the rules of the game. Maybe you are asked to right a computer function (Beauty.py) that is going to play the game as described.
I liked the thought about indeteriminisitc strategies sometimes being the best. I cooked up an extreme example underneath here to (1) show how I think the “find the optimal policy approach” works in practice and to (2) give another example of randomness being a real friend.
Consider two very extreme cases of the sleeping beauty game:
Guess wrong and you die! (GWYD)
Guess right and you live! (GRYL)
In both instances beauty guesses whether the coin was heads of tails on each waking. After the experiment is over her guesses are considered. In the first game, one (or more) wrong guesses get beauty killed. In the Second a single right guess is needed, otherwise she is killed.
In the first game, GWYD, (I think) it is obvious that the deterministic strategy “guess heads” is just as good as the deterministic strategy “guess tails”. Importantly the fact that Beauty is sometimes woken twice is just not relevant in this game: because providing the same answer twice (once on each day) changes nothing. As both the “guess tails” and “guess heads” policies are considered equally good one could say that the revealed probability of heads is 1⁄2. (Although as I said previously I would avoid the word “probability” entirely and just talk about optimal strategies given some particular scoring system).
However, in the second game, GRYL, it is clear that a randomised strategy is best (assuming beauty has access to a random number generator). If the coin does come up tails then Beauty gets two guesses and (in GRYL) it would be very valuable for those two guesses to be different from one another. However, if it is heads she gets only one try.
I drew myself a little tree diagram expecting to find that in GRYL the probabilities revealed by the optimal policy would be thirder ones, but I actually found that the optimal strategy in this case is to guess heads with 50% probability, and tails with 50% probability (overall giving a survival chance of 5⁄8).
I had vaguely assumed that GWYD was a “halfers game” and GRYL was a “thirders”. Odd that they both give strategies that feel “halfer-y”. (It remains the case that in a game where right guesses are worth +1$ and wrong guesses −1$ that the thirder tactic is best, guess tails every time, expect to win 2/3rds).
* One could go even crazier and argue that experiences that are, by definition (amnesia drug) indistinguishable should be treated like indistinguishable particles.
One could argue that if the coin is flicked and comes up tails then we have both “Tails&Monday” and “Tails&Tuesday” as both being correct, sequentially.
Yes, it is a commonplace occurrence that “Today is Monday” and “Today is Tuesday” can both be true, on different days. This doesn’t ordinarily prevent people from assigning probabilities to statements like “Today is Monday”, when they happen to not remember for sure whether it is Monday or not now. And the situation is the same for Beauty—it is either Monday or Tuesday, she doesn’t know which, but she could find out if she just left the room and asked some passerby. Whether it is Monday or Tuesday is an aspect of the external world, which one normally regards as objectively existing regardless of one’s knowledge of it.
All this is painfully obvious. I think it’s not obvious to you because you don’t accept that Beauty is a human being, not a python program. Note also that Beauty’s experience on Monday is not the same as on Tuesday (if she is woken). Actual human beings don’t have exactly the same experiences on two different days, even if the have memory issues. The problem setup specifies only that these differences aren’t informative about whether it’s Monday or Tuesday.
What do you mean by “probability”.
I’m using “probability” in the subjective Bayesian sense of “degree of belief”. Since the question in the Sleeping Beauty problem is what probability of Heads should Beauty have when awoken, I can’t see how any other interpretation would address the question asked. Note that these subjective degree-of-belief probabilities are intended to be a useful guide to decision-making. If they lead one to make clearly bad decisions, they must be wrong.
Consider two very extreme cases of the sleeping beauty game: - Guess wrong and you die! (GWYD) - Guess right and you live! (GRYL)
If we look at the GWYD and GRYL scenarios you describe, we can, using an “outside” view, see what the optimal strategies are, based on how frequently Beauty survives in repeated instances of the problem. To see whether Beauty’s subjective probability of Heads should be 1⁄2 or 1⁄3, we can ask whether after working out an optimal strategy beforehand, Beauty will change her mind after waking up and judging that P(Heads) is either 1⁄2 or 1⁄3, and then using that to decide whether to follow the original strategy or not. If Beauty’s judgement of P(Heads) leads to her abandoning the optimal strategy, there must be something wrong with that P(Heads).
For GWYD, both the strategy of deterministically guessing Heads on all awakenings and the strategy of deterministically guessing Tails on all awakenings will give a survival probability of 1⁄2, which is optimal (I’ll omit the proof of this).
Suppose Beauty decides ahead of time to deterministically guess Tails. Will she change her mind and guess Heads instead when she wakes up?
Suppose that Beauty thinks that P(Heads)=1/3 upon wakening. She will then think that if she guesses Heads, her probability of surviving is P(Heads)=1/3. If instead, she guesses Tails, she thinks her probability of surviving is P(Tails & on other wakening she also guesses Tails), which is 2⁄3 if she is sure to follow the original plan in her other wakening, and is greater than 1⁄3 as long as she’s more likely than not to follow the original plan on her other wakening. So unless Beauty thinks she will do something perverse on her other wakening, she should think that following the original plan and guessing Tails is her best action.
Now suppose that Beauty thinks that P(Heads)=1/2 upon wakening. She will then think that if she guesses Heads, her probability of surviving is P(Heads)=1/2. If instead, she guesses Tails, she thinks her probability of surviving is P(Tails & on other wakening she also guesses Tails), which is 1⁄2 if she is sure to follow the original plan in her other wakening, and less than 1⁄2 if there is any chance that on her other wakening she doesn’t follow the original plan. Since Beauty is a human being, who at least once in a while does something strange or mistaken, the probability that she won’t follow the plan on her other wakening is surely not zero, so Beauty will judge her survival probability to be greater if she guesses Heads than if she guesses Tails, and abandon the original plan of guessing Tails.
Now, if Beauty thinks P(Heads)=1/2 and then always reasons in this way on wakening, then things turn out OK—despite having originally planned to always guess Tails, she actually always guesses Heads. But if there is a non-negligible chance that she follows the original plan without thinking much, she will end up dead more than half the time.
So if Beauty thinks P(Heads)=1/3, she does the right thing, but if Beauty thinks P(Heads)=1/2, she maybe does the right thing, but not really reliably.
Turning now to the GRYL scenario, we need to consider randomized strategies. Taking an outside view, suppose that Beauty follows the strategy of guessing heads with probability h, independently each time she wakes. Then the probability that she survives is S=(1/2)h+(1⁄2)(1-h*h) - that is, the probability the coin lands Heads (1/2) times the probability she guesses Heads, plus the probability that the coin lands Tails time the probability that she doesn’t guess Heads on both awakenings. The derivative of S with respect to h is (1/2)-h, which is zero when h=1/2, and one can verify this gives a maximum for S of 5⁄8 (better than the survival probability when deterministically guessing either Heads of Tails).
This matches what you concluded. However, Beauty guessing Heads with probability 1⁄2 does not imply that Beauty thinks P(Heads)=1/2. The “guess” here is not made in an attempt to guess the coin flip correctly, but rather in an attempt to not die. The mechanism for how the guess influences whether Beauty dies is crucial.
We can see this by seeing what Beauty will do after waking if she thinks that P(Heads)=1/2. She knows that her original strategy is to randomly guess, with Heads and Tails both having probability 1⁄2. But she can change her mind if this seems advisable. She will think that if she guesses Tails, her probability of survival will be P(Tails)=1/2 (she won’t survive if the coin landed Heads, because this will be her only (wrong) guess, and she will definitely survive if the coin landed Tails, regardless what she does on her other awakening). She will also compute that if she guesses Heads, she will survive with probability P(Heads)+P(Tails & she guesses Tails on her other wakening). Since P(Heads)=1/2, this will be greater than 1⁄2, since the second term surely is not exactly zero (it will be 1⁄4 if she follows the original strategy on her other awakening). So she will think that guessing Heads gives her a strictly greater chance of surviving than guessing Tails, and so will just guess Heads rather than following the original plan of guessing randomly.
The end result, if she reasons this way each awakening, is that she always guesses Heads, and hence survives with probability 1⁄2 rather than 5⁄8.
Now lets see what Beauty does if after wakening she thinks that P(Heads)=1/3. She will think that if she guesses Tails, her probability of survival will be P(Tails)=2/3. She will think that if she guesses Heads, her probability of survival will be P(Heads)+P(Tails & she guesses Tails on her other wakening). If she thinks she will follow the original strategy on her other wakening, the this is (1/3)+(2/3)*(1/2)=2/3. Since she computes her probability of survival to be the same whether she guesses Heads or Tails, she has no reason to depart from the original strategy of guessing Heads or Tails randomly. (And this reinforces her assumption that on another wakening she would follow the original strategy.)
So if Beauty is a Thirder, she lives with probability 5⁄8, but if she is a Halfer, she lives with lower probability of 1⁄2.
We are in complete agreement about how beauty should strategize given each of the three games (bet a dollar on the coin flick with odds K, GWYL and GWYD). The only difference is that you are insisting that “1/3” is Beauty’s “degree of belief”. (By the way I am glad you repeated the same maths I did for GWYD, it was simple enough but the answer felt surprising so I am glad you got the same.)
In contrast, I think we actually have two quantities:
“Quobability”—The frequency of correct guesses made divided by the total number of guesses made.
“Srobability”—The frequency of trials in which the correct guess was made, divided by the number of trials.
Quabability is 1⁄3, Scrobability is 1⁄2. “Probability” is (I think) an under-precise term that could mean either of the two.
You say you are a Bayesian, not a frequentist. So for you “probability” is degree of belief. I would also consider myself a Bayesian, and I would say that normally I can express my degree of belief with a single number, but that in this case I want to give two numbers, “Quobability = 1⁄3, Scrobability =1/2″. What I like about giving two numbers is that typically a Bayesian’s single probability value given is indicative of how they would bet. In this case the two quantities are both needed to see how I would bet given slightly varies betting rules.
I was still interested in “GRYL”, which I had originally assumed would support the thirder position, but (for a normal coin) had the optimal tactic being to pick at 50⁄50 odds. I just looked at biased coins.
For a biased coin that (when flicked normally, without any sleep or anmesia) comes up heads with probability k. (k on x-axis), I assume Beauty is playing GRYL, and that she is guessing heads with some probability. The optimal probability for her strategy to take is on the y-axis (blue line). Overall chance of survival is orange line.
You are completely correct that her guess is in no way related to the actual coin bias (k), except for k=0.5 specifically which is an exception not the rule. In fact, this graph appears to be vaguely pushing in some kind of thirder position, in that the value 2/3rds takes on special significance as beyond this point beauty always guesses heads. In contrast when tails is more likely she still keeps some chance of guessing heads because she is banking on one of her two tries coming up tails in the case of tails, so she can afford some preparation for heads.
“Quobability”—The frequency of correct guesses made divided by the total number of guesses made.
“Srobability”—The frequency of trials in which the correct guess was made, divided by the number of trials.
Quabability is 1⁄3, Scrobability is 1⁄2. “Probability” is (I think) an under-precise term that could mean either of the two.
I suspect that the real problem isn’t with the word “probability”, but rather the word “guess”. In everyday usage, we use “guess” when the aim is to guess correctly. But the aim here is to not die.
Suppose we rephrase the GRYL scenario to say that Beauty at each awakening takes one of two actions—“action H” or “action T”. If the coin lands Heads, and Beauty takes action H the one time she is woken, then she lives (if she instead takes action T, she dies). If the coin lands Tails, and Beauty takes action T at least one of the two times she is woken, then she lives (if she takes action H both times, she dies).
Having eliminated the word “guess”, why would one think that Beauty’s use of the strategy of randomly taking action H or action T with equal probabilities implies that she must have P(Heads)=1/2? As I’ve shown above, that strategy is actually only compatible with her belief being that P(Heads)=1/3.
Note that in general, the “action space” for a decision theory problem need not be the same as the “state space”. One might, for example, have some uncertain information about what day of the week it is (7 possibilities) and on that basis decide whether to order pepperoni, anchovy, or ham pizza (3 possibilities). (You know that different people, with different skills, usually make the pizza on different days.) So if for some reason you randomized your choice of action, it would certainly not say anything directly about your probabilities for the different days of the week.
Maybe we are starting to go in circles. But while I agree the word “guess” might be problematic I think you still have an ambiguity with what the word probability means in this case. Perhaps you could give the definition you would use for the word “probability”.
“In everyday usage, we use “guess” when the aim is to guess correctly.” Guess correctly in the largest proportion of trials, or in the largest proportion of guesses? I think my “scrob” and “quob” thingies are indeed aiming to guess correctly. One in the most possible trials, the other in the most possible individual instances of making a guess.
“Having eliminated the word “guess”, why would one think that Beauty’s use of the strategy of randomly taking action H or action T with equal probabilities implies that she must have P(Heads)=1/2?”—I initially conjectured this as weak evidence, but no longer hold the position at all, as I explained in the post with the graph. However, I still think that in the other death-scenario (Guess Wrong you die) the fact that deterministically picking heads is equally good to deterministically picking tails says something. This GWYD case sets the rules of the wager such that Beauty is trying to be right in as many trials as possible, instead of for as many individual awakenings. Clearly moving the goalposts to the “number of trials” denominator.
For me, the issue is that you appear to take “probability” as “obviously” meaning “proportion of awakenings”. I do not think this is forced on us by anything, and that both denominators (awakenings and trials) provide us with useful information that can beneficially inform our decision making, depending on whether we want to be right in as many awakenings or trials as possible. Perhaps you could explain your position while tabooing the word “probability”? Because, I think we have entered the Tree-falling-in-forest zone: https://www.lesswrong.com/posts/7X2j8HAkWdmMoS8PE/disputing-definitions, and I have tried to split our problem term in two (Quobability and Srobability) but it hasn’t helped.
Perhaps you could give the definition you would use for the word “probability”.
I define it as one’s personal degree of belief in a proposition, at the time the judgement of probability is being made. It has meaning only in so far it is (or may be) used to make a decision, or is part of a general world model that is itself meaningful. (For example, we might assign a probability to Jupiter having a solid core, even though that makes no difference to anything we plan to do, because that proposition is part of an overall theory of physics that is meaningful.)
Frequentist ideas about probability being related to the proportion of times that an event occurs in repetitions of a scenario are not part of this definition, so the question of what denominator to use does not arise. (Looking at frequentist concepts can sometimes be a useful sanity check on whether probability judgements make sense, but if there’s some conflict between frequentist and Bayesian results, the solution is to re-examine the Bayesian results, to see if you made a mistake, or to understand why the frequentist results don’t actually contradict the Bayesian result.)
If you make the right probability judgements, you are supposed to make the right decision, if you correctly apply decision theory. And Beauty does make the right decision in all the Sleeping Beauty scenarios if she judges that P(Heads)=1/3 when woken before Wednesday. She doesn’t make the right decision if she judges that P(Heads)=1/2. I emphasize that this is so for all the scenarios. Beauty doesn’t have to ask herself, “what denominator should I be using?”. P(Heads)=1/3 gives the right answer every time.
Another very useful property of probability judgements is that they can be used for multiple decisions, without change. Suppose, for example, that in the GWYD or GRYL scenarios, in addition to trying not to die, Beauty is also interested in muffins.
Specifically, she knows from the start that whenever she wakes up there will be a plate of freshly-baked muffins on her side table, purchased from the cafe down the road. She knows this cafe well, and in particular knows that (a) their muffins are always very delicious, and (b) on Tuesdays, but not Mondays, the person who bakes the muffins adds an ingredient that gives her a stomach ache 10 minutes after eating a muffin. Balancing these utilities, she decides to eat the muffins if the probability of it being Tuesday is less than 30%. If Beauty is a Thirder, she will judge the probability of Tuesday to be 1⁄3, and refrain from eating the muffins, but if Beauty is a Halfer, she will (I think, trying to pretend I’m a halfer) think the probability of Tuesday is 1⁄4, and eat the muffins.
The point here is not so much which decision is correct (though of course I think the Thirder decision is right), but that whatever the right decision is, it shouldn’t depend on whether Beauty is in the GWYD or GRYL scenario. She shouldn’t be considering “denominators”.
I agree that you can make a betting/scoring setup such that betting/predicting at 50% is correct. Eg,suppose that on both Monday and Tuesday, Beauty gets to make a $1 bet at some odds. If she’s asleep on Tuesday then whatever bet she made on Monday is repeated. In that case she should bet at 1:1 odds.
Let’s work it out if Beauty has 1⁄3 probability for Heads, with the 2⁄3 probability for Tails split evenly between 1⁄3 for Tails&Monday and 1⁄3 for Tails&Tuesday.
Here, the possible actions are “don’t bet” or “bet on Heads”, at 1:1 odds (let’s say win $1 or lose $1), on each day for which a bet is active.
The expected gain from not betting is of course zero. The expected gain from betting on some day can be computed by summing the gains in each situation (Heads&Monday, Tails&Monday, Tails&Tuesday) times the corresponding probabilities:
P(Heads&Monday) x 2 + P(Tails&Monday) x (-1) + P(Tails&Tuesday) x (-1)
This evaluates to zero. Note that the gain for betting in the Heads&Monday situation is $2 because if she makes this bet, it is active for two days, on each of which it wins $1. Since the expected gain is zero for both betting and not betting, Beauty is indifferent to betting on Heads or not betting. If the odds shifted a bit, so winning the bet paid $1 but losing the bet cost $1.06, then the expected gain would be minus $0.04, and Beauty should not make the bet on Heads.
Note that when Beauty is woken on both Monday and Tuesday, we expect that she will make the same decision both days, since she has no reason to make different decisions, but this is not some sort of logical guarantee—Beauty is a human being, who makes decisions for herself at each moment in time, which can conceivably (but perhaps with very low probability) be inconsistent with other decisions she makes at other times.
Now suppose that Beauty’s probability for Heads is 1⁄2, so that P(Heads&Monday)=1/2, P(Tails&Monday)=1/4, and P(Tails&Tuesday)=1/4, though the split between Tails&Monday and Tails&Tuesday doesn’t actually matter in this scenario.
Plugging these probabilities into the formula above, we compute the expected gain to be $0.50. So Beauty is not indifferent, but instead will definitely want to bet on Heads. If we shift the payoffs to win $1 lose $1.06, then the expected gain is $0.47, so Beauty will still want to bet on Heads.
However, when a win pays $1 and a loss costs $1.06, arguments from an outside perspective say that if every day she is woken Beauty follows this strategy of betting on Heads, her expected gain is minus $0.03 per experiment. [Edit: Correction, the expected gain per experiment is minus $0.06 - I forgot to multiply by two for the bet being made twice.]
So we see that Beauty makes a mistake if her probability for Heads is 1⁄2. She makes the right decision (if she applies standard decision theory) if her probability of Heads is 1⁄3.
Note that this is about how much money Beauty gets. One can get the right answer by applying standard decision theory. There is no role for some arbitrary “scoring setup”.
Thank you for sharing that Bayesian Beauty paper, its very clear.
I think the OP is onto something useful when they distinguish between (my phrasing) the approach such that “my probabilities are calibrated such that they represent the proportion of timelines in which I am right” and the approach that means “my probabilities are calibrated with respect to the number of times I answer questions about them”.
I take the “timelines” (first) option. The second one seems odd to me, for example say someone considers a multiple choice question, decides that answer (a) is correct. They are told they have to wait in the examination room while other candidates finish. During this 10 minute wait they spend 9 minutes confident (a) was the right choice, then with 1 minute to go, realise (b) is actually right and change their answer. Here I would only judge the correctness (or otherwise) of the final choice, (b), and it would not be relevant that they spent more time believing (a). Similarly, if Beauty is woken for a second time any guesses she makes about the coin on this second waking should be taken to replace any guesses she made on the first waking (if you take the timelines point of view). I think we should only mark “final answers”, otherwise we end up in really weird territory where if you think something is true with 2/3rds probability, but you used to think it was true with 1/3rd probability, then you should exaggerate your newfound belief that it is more likely in order to be more right on a temporal average.
Argument against marking only final answers: on Wednesday Beauty wakes up, and learns that the experiment has concluded. At this time her credence for P(heads) is 1⁄2. This is true both on a per-Wednesday-awakening basis and on a per-coin-flip basis, regardless of whether she is a thirder, halfer, or something else. If we only mark final answers, the question of the correct probability on Monday and Tuesday is left with no answer.
I agree that you can make a betting/scoring setup such that betting/predicting at 50% is correct. Eg, suppose that on both Monday and Tuesday, Beauty gets to make a $1 bet at some odds. If she’s asleep on Tuesday then whatever bet she made on Monday is repeated. In that case she should bet at 1:1 odds. With other scoring setups we can get different answers. Overall, this makes me think that this approach is misguided, unless we have a convincing argument about which scoring setup is correct.
I think the exam hypothetical creates misleading intuitions because written exams are generally graded based on the final result. Whereas if this was an oral examination in spoken language, a candidate who spent the first nine minutes with incorrect beliefs about grammatical gender would lose points. But certainly if I knew my life was being scored based on my probabilities at the moment of death, I would optimize for dying in a place of accurate certainty.
Yes, the Wednesday point is a good one, so it the oral exam comparison.
I think we agree that the details of the “scoring system” completely change the approach beauty should take. This is not true for most probability questions. Like, if she can bet 1$ at some odds each time she wakes up then it makes sense for her policy going in to more heavily weight the timeline in which she gets to bet twice. As you point out if her sleeping self repeats bets that changes things. If the Tuesday bet is considered to be “you can take the bet, but it will replace the one you may or may not have given on a previous day if their was one”, then things line up to half again. If she has to guess the correct outcome of the coin flip, or else die once the experiment is over, then the strategy where she always guesses heads is just as good as the one where she always guesses tails. Her possible submissions are [H, T, HH, TT], two of which result in death. Where we differ is that you think the details of the scoring system being relevant suggest the approach is misguided. In contrast I think the fact that scoring system details matter is the entire heart of the problem. If I take probabilities as “how I should bet” then the details of the bet should matter. If I take probabilities as frequencies then I need to decide whether the denominator is “per timeline” or “per guess”. I don’t think the situation allows one to avoid these choices, and (at least to me) once these are identified as choices, with neither option pushed upon us by probability theory, the whole situation appears to make sense.
Frequentist example—If you understood me above you certainly don’t need this example so skip ---: The experiment is repeated 100 times, with 50 heads and 50 tails on the coin flip. A total of 150 guesses are made by beauty. BeautyV1.0 said the probability was 50⁄50 every time. This means that for 100 of the times she answered 50⁄50 it was actually tails, and 50 times she answered this same way it was actually heads. So she is poorly calibrated with respect to the number of times she answered. By this scoring system BeautyV2.0 who says “1/3” appears better.
However, in every single trial, even the ones where beauty is consulted twice, she only provides one unique answer. If beauty is a simple program call in a simulation then she is a memoryless program, in some cases evaluated twice. In half the trials it was heads, in half it was tails, in every trial BeautyV1.0 said it was 50⁄50. So she is well calibrated per trial, but not per evaluation. BeautyV2.0 is the other way around.
I believe this IS true for most probability questions. However, most probability questions have only one reasonable scoring system to consider, so you don’t notice the question.
If the Tuesday bet is considered to be “you can take the bet, but it will replace the one you may or may not have given on a previous day if their was one”, then things line up to half again.
I can think of two interpretations of the setup you’re describing here, but for both interpretations, Beauty does the right thing only if she thinks Heads has probability 1⁄3, not 1⁄2.
Note that depending on the context, a probability of 1⁄2 for something does not necessarily lead one to bet on it at 1:1 odds. For instance, if based on almost no knowledge of baseball, you were to assign probability 1⁄2 to the Yankees winning their next game, it would be imprudent to offer to bet anyone that they will win, at 1:1 odds. It’s likely that the only people to take you up on this bet are the ones who have better knowledge, that leads them to think that the Yankees will not win. But even after having decided not to offer this bet, you should still think the probability of the Yankees winning is 1⁄2, assuming you have not obtained any new information.
I’ll assume that Beauty is always given the option of betting on Heads at 1:1 odds. From an outside view, it is clear that whatever she does, her expected gain is zero, so we hope that that is also her conclusion when analyzing the situation after each awakening. Of course, by shifting the payoffs when betting on Heads a bit (eg, win $0.90, lose $1) we can make betting on Heads either clearly desirable or clearly undesirable, and we’d hope that Beauty correctly decides in such cases.
So… first interpretation: At each awakening, Beauty can bet on Heads at 1:1 odds (winning $1 or losing $1), or decide not to bet. If she decides to bet on Tuesday, any bet she may have made on Monday is cancelled, and if she decides not to bet on Tuesday, any bet she made on Monday is also cancelled. In other words, what she does on Monday is irrelevant if the coin landed Tails, since she will in that case be woken on Tuesday and whatever she decides then will replace whatever decision she made on Monday. (Of course, if the coin landed Heads, her decision on Monday does take effect, since she isn’t woken on Tuesday.)
Beauty’s computation on awakening of the expected gain from betting on Heads (winning $1 or losing $1) can be written as follows:
P(Heads&Monday)*1 + P(Tails&Monday)*0 + P(Tails&Tuesday)*(-1)
The 0 gain from betting on Monday when the coin landed Tails is because in that situation her decision is irrelevant. The expected gain from not betting is of course zero.
If Beauty thinks that P(Heads)=1/3, and hence P(Heads&Monday)=1/3, P(Tails&Monday)=1/3, and P(Tails&Tuesday)=1/3, and the formula above evaluates to zero, as we would hope. Furthermore, if the payoffs deviate from 1:1, Beauty will decide to bet or not in the correct way.
If Beauty instead thinks that P(Heads)=1/2, then P(Heads&Monday)=1/2. Trying to guess what a Halfer would think, I’ll also assume that she thinks that P(Tails&Monday)=1/4 and P(Tails&Tuesday)=1/4. (Those who deny that these probabilities should sum to one, or maintain that the concept of probability is inapplicable to Beauty’s situation, are beyond the bounds of rational discussion.) If we plug these into the formula above, we get that Beauty thinks her expected gain from betting on Heads is $0.25, which is contrary to the outside view that it is zero. Furthermore, if we shift the payoffs a little bit, so a win betting on Heads gives $0.90 and a loss betting on Heads still costs $1, then Beauty will compute the expected gain from betting on Heads to be $0.45 minus $0.25, which is $0.20$, so she’s still enthusiastic about betting on Heads. But the outside view is that with these payoffs the expected gain is minus $0.05, so betting on Heads is a losing proposition.
Now… second interpretation: At each awakening, Beauty can bet or decide not to bet. If she decides to bet on Tuesday, any bet she may have made on Monday is cancelled, but if she decides not to bet on Tuesday, a bet she made on Monday is still in effect. In other words, she makes a bet if she decides to bet on either Monday or Tuesday, but if she decides to bet on both Monday and Tuesday, it only counts as one bet.
This one is more subtle. I answered essentially the same question in a comment on this blog post, so I’ll just copy the relevant portion below, showing how a Thirder will do the right thing (with other than 1:1 payoffs). Note that “invest” means essentially the same as “bet”:
- - -
First, to find the optimal strategy in this situation where investing when Heads yields −60 for Beauty, and investing when Tails yields +40 for Beauty, but only once if Beauty decides to invest on both wakenings, we need to consider random strategies in which with probability q, Beauty does not invest, and with probability (1-q) she does invest.
Beauty’s expected gain if she follows such a strategy is (1/2)40(1-q^2)-(1/2)60(1-q). Taking the derivative, we get −40q+30, and solving for this being zero gives q=3/4 as the optimal value. So Beauty should invest with probability 1⁄4, which works out to giving her an average return of (1/2)40(7/16)-(1/2)60(1/4)=5/4.
That’s all when working out the strategy beforehand. But what if Beauty re-thinks her strategy when she is woken? Will she change her mind, making this strategy inconsistent?
Well, she will think that with probability 1⁄3, the coin landed Heads, and the investment will lose 60, and with probability 2⁄3, the coin landed Tails, in which case the investment will gain 40, but only if she didn’t/doesn’t invest in her other wakening. If she follows the strategy worked out above, in her other wakening, she doesn’t invest with probability 3⁄4, and if Beauty thinks things over with that assumption, she will see her expected gain as (2/3)(3/4)40-(1/3)60=0. With a zero expected gain, Beauty would seem indifferent to investing or not investing, and so has no reason to depart from the randomized strategy worked out beforehand. We might hope that this reasoning would lead to a positive recommendation to randomize, not just acquiescence to doing that, but in Bayesian decision theory randomization is never required, so if there’s a problem here it’s with Bayesian decision theory, not with the Sleeping Beauty problem in particular.
- - -
On the other hand, if Beauty thinks the probability of Heads is 1⁄2, then she will think that investing has expected return of (1/2)(-60) + (1/2)40 or less (the 40 assumes she didn’t/doesn’t invest on the other awakening, if she did/does it would be 0). Since this is negative, she will change her mind and not invest, which is a mistake.
“If Beauty instead thinks that P(Heads)=1/2, then P(Heads&Monday)=1/2. Trying to guess what a Halfer would think, I’ll also assume that she thinks that P(Tails&Monday)=1/4 and P(Tails&Tuesday)=1/4”
This part of the analysis gets to the core of my opinion. One could argue that if the coin is flicked and comes up tails then we have both “Tails&Monday” and “Tails&Tuesday” as both being correct, sequentially. They are not mutually exclusive outcomes. One could also argue that, on a particular waking up Beauty doesn’t know which day it is and thus they are exclusive in that moment.*
Which gets me back to my point. What do you mean by “probability”. If we take a frequentist picture do we divide by the number of timelines (or equivalently separate experimental runs if many are run), or do we divide by the number of times beauty wakes up. I think either can be defended and that you should avoid the words “probability” or the notation P( X ) until after you have specified which denominator you are taking.
I like the idea of trying to evaluate the optimal policy for Beauty, given the rules of the game. Maybe you are asked to right a computer function (Beauty.py) that is going to play the game as described.
I liked the thought about indeteriminisitc strategies sometimes being the best. I cooked up an extreme example underneath here to (1) show how I think the “find the optimal policy approach” works in practice and to (2) give another example of randomness being a real friend.
Consider two very extreme cases of the sleeping beauty game:
Guess wrong and you die! (GWYD)
Guess right and you live! (GRYL)
In both instances beauty guesses whether the coin was heads of tails on each waking. After the experiment is over her guesses are considered. In the first game, one (or more) wrong guesses get beauty killed. In the Second a single right guess is needed, otherwise she is killed.
In the first game, GWYD, (I think) it is obvious that the deterministic strategy “guess heads” is just as good as the deterministic strategy “guess tails”. Importantly the fact that Beauty is sometimes woken twice is just not relevant in this game: because providing the same answer twice (once on each day) changes nothing. As both the “guess tails” and “guess heads” policies are considered equally good one could say that the revealed probability of heads is 1⁄2. (Although as I said previously I would avoid the word “probability” entirely and just talk about optimal strategies given some particular scoring system).
However, in the second game, GRYL, it is clear that a randomised strategy is best (assuming beauty has access to a random number generator). If the coin does come up tails then Beauty gets two guesses and (in GRYL) it would be very valuable for those two guesses to be different from one another. However, if it is heads she gets only one try.
I drew myself a little tree diagram expecting to find that in GRYL the probabilities revealed by the optimal policy would be thirder ones, but I actually found that the optimal strategy in this case is to guess heads with 50% probability, and tails with 50% probability (overall giving a survival chance of 5⁄8).
I had vaguely assumed that GWYD was a “halfers game” and GRYL was a “thirders”. Odd that they both give strategies that feel “halfer-y”. (It remains the case that in a game where right guesses are worth +1$ and wrong guesses −1$ that the thirder tactic is best, guess tails every time, expect to win 2/3rds).
* One could go even crazier and argue that experiences that are, by definition (amnesia drug) indistinguishable should be treated like indistinguishable particles.
One could argue that if the coin is flicked and comes up tails then we have both “Tails&Monday” and “Tails&Tuesday” as both being correct, sequentially.
Yes, it is a commonplace occurrence that “Today is Monday” and “Today is Tuesday” can both be true, on different days. This doesn’t ordinarily prevent people from assigning probabilities to statements like “Today is Monday”, when they happen to not remember for sure whether it is Monday or not now. And the situation is the same for Beauty—it is either Monday or Tuesday, she doesn’t know which, but she could find out if she just left the room and asked some passerby. Whether it is Monday or Tuesday is an aspect of the external world, which one normally regards as objectively existing regardless of one’s knowledge of it.
All this is painfully obvious. I think it’s not obvious to you because you don’t accept that Beauty is a human being, not a python program. Note also that Beauty’s experience on Monday is not the same as on Tuesday (if she is woken). Actual human beings don’t have exactly the same experiences on two different days, even if the have memory issues. The problem setup specifies only that these differences aren’t informative about whether it’s Monday or Tuesday.
What do you mean by “probability”.
I’m using “probability” in the subjective Bayesian sense of “degree of belief”. Since the question in the Sleeping Beauty problem is what probability of Heads should Beauty have when awoken, I can’t see how any other interpretation would address the question asked. Note that these subjective degree-of-belief probabilities are intended to be a useful guide to decision-making. If they lead one to make clearly bad decisions, they must be wrong.
Consider two very extreme cases of the sleeping beauty game: - Guess wrong and you die! (GWYD) - Guess right and you live! (GRYL)
If we look at the GWYD and GRYL scenarios you describe, we can, using an “outside” view, see what the optimal strategies are, based on how frequently Beauty survives in repeated instances of the problem. To see whether Beauty’s subjective probability of Heads should be 1⁄2 or 1⁄3, we can ask whether after working out an optimal strategy beforehand, Beauty will change her mind after waking up and judging that P(Heads) is either 1⁄2 or 1⁄3, and then using that to decide whether to follow the original strategy or not. If Beauty’s judgement of P(Heads) leads to her abandoning the optimal strategy, there must be something wrong with that P(Heads).
For GWYD, both the strategy of deterministically guessing Heads on all awakenings and the strategy of deterministically guessing Tails on all awakenings will give a survival probability of 1⁄2, which is optimal (I’ll omit the proof of this).
Suppose Beauty decides ahead of time to deterministically guess Tails. Will she change her mind and guess Heads instead when she wakes up?
Suppose that Beauty thinks that P(Heads)=1/3 upon wakening. She will then think that if she guesses Heads, her probability of surviving is P(Heads)=1/3. If instead, she guesses Tails, she thinks her probability of surviving is P(Tails & on other wakening she also guesses Tails), which is 2⁄3 if she is sure to follow the original plan in her other wakening, and is greater than 1⁄3 as long as she’s more likely than not to follow the original plan on her other wakening. So unless Beauty thinks she will do something perverse on her other wakening, she should think that following the original plan and guessing Tails is her best action.
Now suppose that Beauty thinks that P(Heads)=1/2 upon wakening. She will then think that if she guesses Heads, her probability of surviving is P(Heads)=1/2. If instead, she guesses Tails, she thinks her probability of surviving is P(Tails & on other wakening she also guesses Tails), which is 1⁄2 if she is sure to follow the original plan in her other wakening, and less than 1⁄2 if there is any chance that on her other wakening she doesn’t follow the original plan. Since Beauty is a human being, who at least once in a while does something strange or mistaken, the probability that she won’t follow the plan on her other wakening is surely not zero, so Beauty will judge her survival probability to be greater if she guesses Heads than if she guesses Tails, and abandon the original plan of guessing Tails.
Now, if Beauty thinks P(Heads)=1/2 and then always reasons in this way on wakening, then things turn out OK—despite having originally planned to always guess Tails, she actually always guesses Heads. But if there is a non-negligible chance that she follows the original plan without thinking much, she will end up dead more than half the time.
So if Beauty thinks P(Heads)=1/3, she does the right thing, but if Beauty thinks P(Heads)=1/2, she maybe does the right thing, but not really reliably.
Turning now to the GRYL scenario, we need to consider randomized strategies. Taking an outside view, suppose that Beauty follows the strategy of guessing heads with probability h, independently each time she wakes. Then the probability that she survives is S=(1/2)h+(1⁄2)(1-h*h) - that is, the probability the coin lands Heads (1/2) times the probability she guesses Heads, plus the probability that the coin lands Tails time the probability that she doesn’t guess Heads on both awakenings. The derivative of S with respect to h is (1/2)-h, which is zero when h=1/2, and one can verify this gives a maximum for S of 5⁄8 (better than the survival probability when deterministically guessing either Heads of Tails).
This matches what you concluded. However, Beauty guessing Heads with probability 1⁄2 does not imply that Beauty thinks P(Heads)=1/2. The “guess” here is not made in an attempt to guess the coin flip correctly, but rather in an attempt to not die. The mechanism for how the guess influences whether Beauty dies is crucial.
We can see this by seeing what Beauty will do after waking if she thinks that P(Heads)=1/2. She knows that her original strategy is to randomly guess, with Heads and Tails both having probability 1⁄2. But she can change her mind if this seems advisable. She will think that if she guesses Tails, her probability of survival will be P(Tails)=1/2 (she won’t survive if the coin landed Heads, because this will be her only (wrong) guess, and she will definitely survive if the coin landed Tails, regardless what she does on her other awakening). She will also compute that if she guesses Heads, she will survive with probability P(Heads)+P(Tails & she guesses Tails on her other wakening). Since P(Heads)=1/2, this will be greater than 1⁄2, since the second term surely is not exactly zero (it will be 1⁄4 if she follows the original strategy on her other awakening). So she will think that guessing Heads gives her a strictly greater chance of surviving than guessing Tails, and so will just guess Heads rather than following the original plan of guessing randomly.
The end result, if she reasons this way each awakening, is that she always guesses Heads, and hence survives with probability 1⁄2 rather than 5⁄8.
Now lets see what Beauty does if after wakening she thinks that P(Heads)=1/3. She will think that if she guesses Tails, her probability of survival will be P(Tails)=2/3. She will think that if she guesses Heads, her probability of survival will be P(Heads)+P(Tails & she guesses Tails on her other wakening). If she thinks she will follow the original strategy on her other wakening, the this is (1/3)+(2/3)*(1/2)=2/3. Since she computes her probability of survival to be the same whether she guesses Heads or Tails, she has no reason to depart from the original strategy of guessing Heads or Tails randomly. (And this reinforces her assumption that on another wakening she would follow the original strategy.)
So if Beauty is a Thirder, she lives with probability 5⁄8, but if she is a Halfer, she lives with lower probability of 1⁄2.
We are in complete agreement about how beauty should strategize given each of the three games (bet a dollar on the coin flick with odds K, GWYL and GWYD). The only difference is that you are insisting that “1/3” is Beauty’s “degree of belief”. (By the way I am glad you repeated the same maths I did for GWYD, it was simple enough but the answer felt surprising so I am glad you got the same.)
In contrast, I think we actually have two quantities:
“Quobability”—The frequency of correct guesses made divided by the total number of guesses made.
“Srobability”—The frequency of trials in which the correct guess was made, divided by the number of trials.
Quabability is 1⁄3, Scrobability is 1⁄2. “Probability” is (I think) an under-precise term that could mean either of the two.
You say you are a Bayesian, not a frequentist. So for you “probability” is degree of belief. I would also consider myself a Bayesian, and I would say that normally I can express my degree of belief with a single number, but that in this case I want to give two numbers, “Quobability = 1⁄3, Scrobability =1/2″. What I like about giving two numbers is that typically a Bayesian’s single probability value given is indicative of how they would bet. In this case the two quantities are both needed to see how I would bet given slightly varies betting rules.
I was still interested in “GRYL”, which I had originally assumed would support the thirder position, but (for a normal coin) had the optimal tactic being to pick at 50⁄50 odds. I just looked at biased coins.
For a biased coin that (when flicked normally, without any sleep or anmesia) comes up heads with probability k. (k on x-axis), I assume Beauty is playing GRYL, and that she is guessing heads with some probability. The optimal probability for her strategy to take is on the y-axis (blue line). Overall chance of survival is orange line.
You are completely correct that her guess is in no way related to the actual coin bias (k), except for k=0.5 specifically which is an exception not the rule. In fact, this graph appears to be vaguely pushing in some kind of thirder position, in that the value 2/3rds takes on special significance as beyond this point beauty always guesses heads. In contrast when tails is more likely she still keeps some chance of guessing heads because she is banking on one of her two tries coming up tails in the case of tails, so she can afford some preparation for heads.
CODE
By “GWYL” do you actually mean “GRYL” (ie, Guess Right You Live)?
Yes I do, very good point!
I think we actually have two quantities:
“Quobability”—The frequency of correct guesses made divided by the total number of guesses made.
“Srobability”—The frequency of trials in which the correct guess was made, divided by the number of trials.
Quabability is 1⁄3, Scrobability is 1⁄2. “Probability” is (I think) an under-precise term that could mean either of the two.
I suspect that the real problem isn’t with the word “probability”, but rather the word “guess”. In everyday usage, we use “guess” when the aim is to guess correctly. But the aim here is to not die.
Suppose we rephrase the GRYL scenario to say that Beauty at each awakening takes one of two actions—“action H” or “action T”. If the coin lands Heads, and Beauty takes action H the one time she is woken, then she lives (if she instead takes action T, she dies). If the coin lands Tails, and Beauty takes action T at least one of the two times she is woken, then she lives (if she takes action H both times, she dies).
Having eliminated the word “guess”, why would one think that Beauty’s use of the strategy of randomly taking action H or action T with equal probabilities implies that she must have P(Heads)=1/2? As I’ve shown above, that strategy is actually only compatible with her belief being that P(Heads)=1/3.
Note that in general, the “action space” for a decision theory problem need not be the same as the “state space”. One might, for example, have some uncertain information about what day of the week it is (7 possibilities) and on that basis decide whether to order pepperoni, anchovy, or ham pizza (3 possibilities). (You know that different people, with different skills, usually make the pizza on different days.) So if for some reason you randomized your choice of action, it would certainly not say anything directly about your probabilities for the different days of the week.
Maybe we are starting to go in circles. But while I agree the word “guess” might be problematic I think you still have an ambiguity with what the word probability means in this case. Perhaps you could give the definition you would use for the word “probability”.
“In everyday usage, we use “guess” when the aim is to guess correctly.” Guess correctly in the largest proportion of trials, or in the largest proportion of guesses? I think my “scrob” and “quob” thingies are indeed aiming to guess correctly. One in the most possible trials, the other in the most possible individual instances of making a guess.
“Having eliminated the word “guess”, why would one think that Beauty’s use of the strategy of randomly taking action H or action T with equal probabilities implies that she must have P(Heads)=1/2?”—I initially conjectured this as weak evidence, but no longer hold the position at all, as I explained in the post with the graph. However, I still think that in the other death-scenario (Guess Wrong you die) the fact that deterministically picking heads is equally good to deterministically picking tails says something. This GWYD case sets the rules of the wager such that Beauty is trying to be right in as many trials as possible, instead of for as many individual awakenings. Clearly moving the goalposts to the “number of trials” denominator.
For me, the issue is that you appear to take “probability” as “obviously” meaning “proportion of awakenings”. I do not think this is forced on us by anything, and that both denominators (awakenings and trials) provide us with useful information that can beneficially inform our decision making, depending on whether we want to be right in as many awakenings or trials as possible. Perhaps you could explain your position while tabooing the word “probability”? Because, I think we have entered the Tree-falling-in-forest zone: https://www.lesswrong.com/posts/7X2j8HAkWdmMoS8PE/disputing-definitions, and I have tried to split our problem term in two (Quobability and Srobability) but it hasn’t helped.
Perhaps you could give the definition you would use for the word “probability”.
I define it as one’s personal degree of belief in a proposition, at the time the judgement of probability is being made. It has meaning only in so far it is (or may be) used to make a decision, or is part of a general world model that is itself meaningful. (For example, we might assign a probability to Jupiter having a solid core, even though that makes no difference to anything we plan to do, because that proposition is part of an overall theory of physics that is meaningful.)
Frequentist ideas about probability being related to the proportion of times that an event occurs in repetitions of a scenario are not part of this definition, so the question of what denominator to use does not arise. (Looking at frequentist concepts can sometimes be a useful sanity check on whether probability judgements make sense, but if there’s some conflict between frequentist and Bayesian results, the solution is to re-examine the Bayesian results, to see if you made a mistake, or to understand why the frequentist results don’t actually contradict the Bayesian result.)
If you make the right probability judgements, you are supposed to make the right decision, if you correctly apply decision theory. And Beauty does make the right decision in all the Sleeping Beauty scenarios if she judges that P(Heads)=1/3 when woken before Wednesday. She doesn’t make the right decision if she judges that P(Heads)=1/2. I emphasize that this is so for all the scenarios. Beauty doesn’t have to ask herself, “what denominator should I be using?”. P(Heads)=1/3 gives the right answer every time.
Another very useful property of probability judgements is that they can be used for multiple decisions, without change. Suppose, for example, that in the GWYD or GRYL scenarios, in addition to trying not to die, Beauty is also interested in muffins.
Specifically, she knows from the start that whenever she wakes up there will be a plate of freshly-baked muffins on her side table, purchased from the cafe down the road. She knows this cafe well, and in particular knows that (a) their muffins are always very delicious, and (b) on Tuesdays, but not Mondays, the person who bakes the muffins adds an ingredient that gives her a stomach ache 10 minutes after eating a muffin. Balancing these utilities, she decides to eat the muffins if the probability of it being Tuesday is less than 30%. If Beauty is a Thirder, she will judge the probability of Tuesday to be 1⁄3, and refrain from eating the muffins, but if Beauty is a Halfer, she will (I think, trying to pretend I’m a halfer) think the probability of Tuesday is 1⁄4, and eat the muffins.
The point here is not so much which decision is correct (though of course I think the Thirder decision is right), but that whatever the right decision is, it shouldn’t depend on whether Beauty is in the GWYD or GRYL scenario. She shouldn’t be considering “denominators”.
I agree that you can make a betting/scoring setup such that betting/predicting at 50% is correct. Eg,suppose that on both Monday and Tuesday, Beauty gets to make a $1 bet at some odds. If she’s asleep on Tuesday then whatever bet she made on Monday is repeated. In that case she should bet at 1:1 odds.
Let’s work it out if Beauty has 1⁄3 probability for Heads, with the 2⁄3 probability for Tails split evenly between 1⁄3 for Tails&Monday and 1⁄3 for Tails&Tuesday.
Here, the possible actions are “don’t bet” or “bet on Heads”, at 1:1 odds (let’s say win $1 or lose $1), on each day for which a bet is active.
The expected gain from not betting is of course zero. The expected gain from betting on some day can be computed by summing the gains in each situation (Heads&Monday, Tails&Monday, Tails&Tuesday) times the corresponding probabilities:
P(Heads&Monday) x 2 + P(Tails&Monday) x (-1) + P(Tails&Tuesday) x (-1)
This evaluates to zero. Note that the gain for betting in the Heads&Monday situation is $2 because if she makes this bet, it is active for two days, on each of which it wins $1. Since the expected gain is zero for both betting and not betting, Beauty is indifferent to betting on Heads or not betting. If the odds shifted a bit, so winning the bet paid $1 but losing the bet cost $1.06, then the expected gain would be minus $0.04, and Beauty should not make the bet on Heads.
Note that when Beauty is woken on both Monday and Tuesday, we expect that she will make the same decision both days, since she has no reason to make different decisions, but this is not some sort of logical guarantee—Beauty is a human being, who makes decisions for herself at each moment in time, which can conceivably (but perhaps with very low probability) be inconsistent with other decisions she makes at other times.
Now suppose that Beauty’s probability for Heads is 1⁄2, so that P(Heads&Monday)=1/2, P(Tails&Monday)=1/4, and P(Tails&Tuesday)=1/4, though the split between Tails&Monday and Tails&Tuesday doesn’t actually matter in this scenario.
Plugging these probabilities into the formula above, we compute the expected gain to be $0.50. So Beauty is not indifferent, but instead will definitely want to bet on Heads. If we shift the payoffs to win $1 lose $1.06, then the expected gain is $0.47, so Beauty will still want to bet on Heads.
However, when a win pays $1 and a loss costs $1.06, arguments from an outside perspective say that if every day she is woken Beauty follows this strategy of betting on Heads, her expected gain is minus $0.03 per experiment. [Edit: Correction, the expected gain per experiment is minus $0.06 - I forgot to multiply by two for the bet being made twice.]
So we see that Beauty makes a mistake if her probability for Heads is 1⁄2. She makes the right decision (if she applies standard decision theory) if her probability of Heads is 1⁄3.
Note that this is about how much money Beauty gets. One can get the right answer by applying standard decision theory. There is no role for some arbitrary “scoring setup”.