Why Two Valid Answers Approach is not Enough for Sleeping Beauty

This is the sixth post in my series on Anthropics. The previous one is Another Non-Anthropic Paradox: The Unsurprising Rareness of Rare Events. The next one is Lessons from Failed Attempts to Model Sleeping Beauty Problem


When I was writing about Anthropical Motte-Bailey, I had a faint hope that insights from it would be enough to solve the Sleeping Beauty paradox. But of course, it couldn’t be that easy. My idea to simply raise the sanity waterline about anthropics to the point where providing a direct answer to this particular problem would be unnecessary also turned out to be wishful thinking. The discourse went for so long and accumulated so much confusion in process, that it requires much more direct attention.

So, now when I’ve written a couple of preliminary posts and hopefully persuaded you that anthropics problems are not special and can be solved in the realms of probability theory as long as we are being precise, let’s do exactly that for the Sleeping Beauty problem.

But at first, I’ll justify why the demand for an answer isn’t unreasonable, and why we shouldn’t be satisfied with the Two Valid Answers Approach. I’ll show what issues this approach has, and how it can be used to highlight the crux of disagreement between two positions. And, given all that, what it would actually mean to solve the Sleeping Beauty problem.

Two Valid Answer Approach

According to the Two Valid Answers Approach, both 12 and 13 are correct answers, but to different questions. The problem is ambiguously formulated: both of these questions can be considered valid interpretations, and both answers are relevant to different decision theory problems.

The situation is not unlike the infamous fallen tree in the forest question. We just need to talk about specific betting odds and stop trying to answer the general and ambiguous question about probability.

I am sympathetic towards this perspective. In my first post on anthropics, I specifically dedicated some place to explain how different scoring rules can produce different probabilities in the Sleeping Beauty setting.

But I think it is not the whole story. Two Valid Answers Approach resolves some of the confusion—yes, but it also attempts to hide the rest of it under the rug.

The issue is that people read such explanations, nod in agreement and then do not really change their initial position on the question.

“Well, yes”, - they say, - “I see how the other side perspective isn’t totally absurd and I’m ready to give them this line of retreat, but now when it’s clear that they are answering some other question, can we finally agree that my position is the correct one?”

Is it just due to the lack of a time travel machine? I’m not so sure. It really seems that the crux isn’t fully resolved yet. After all we are not talking about some imperfectly defined similarity cluster category such as “sound”. We are talking about “probability”—a mathematical concept with a quite precise definition. How come we still have ambiguity about it?

And that’s why all the talk about abolishing probability in favor of pure betting odds is problematic. Not only because if we agree to it, we lose the ability to talk about likelihoods unless we assign some utility function over events, - which itself is silly enough. But here it works as a curiosity stopper to hide a valid mathematical problem.

Okay, never mind the betting odds. We can just as well talk about probability averaged per experiment and probability averaged per awakening, right?

Well, that’s a good question. Are both of them actually valid probabilities in regards to Sleeping Beauty Problem? Yes, people constructed two scoring rules which produce probability-looking results and noticed to which questions these results can serve as answers. But has there been a proper investigation into the matter?

Balls in a Box Machine

As far as I can tell, the idea that both answers are valid should be credited to The end of Sleeping Beauty’s nightmare by Berry Groisman. In this paper an inanimate setting, allegedly isomorphic to the Sleeping Beauty problem is described:

An automatic device tosses a fair coin, if the coin lands ‘Tails’ the device puts two red balls in a box, if the coin lands ‘Heads’ it puts one green ball in the box. The device repeats this procedure a large number of times, N. As a result the box will be full of balls of both colours.

Groisman uses it to highlight the confusion:

Now we have arrived at a critical point. The core of the whole confusion is that we tend to regard ‘A (one) green ball is put in the box’ and ‘A green ball is picked out from the box’ as equivalent.

The reason is that the event ‘A green ball is put in the box’ and the event ‘A
green ball is picked out from the box’ are two different events, and therefore their
probabilities are not necessarily equal. These two events are different because they are the subject to different experimental setups: one is the coin tossing, other is picking up a ball at random from the full box.

And translating back to Sleeping Beauty:

If we mean ‘This awakening is a Head-awakening under the setup of wakening’, then SB’s answer to our question should be 13, but if we mean ‘The coin landed Heads under the setup of coin tossing’, her answer should be 12.

There are good things I can say about Groisman’s paper. For instance, construction of an inanimate setting is very commendable as it shows that anthropic problems are just probability theory problems. But there are also several issues.

First of all, the way Groisman named the two settings inevitably leads to people being confused in a particular way. They see the words “setup of wakening” and immediately think that this is what the initial question in Sleeping Beauty is supposed to mean. After all, it’s about credence that the coin is Heads on awakening, not in general, which is, apparently, what “setup of coin tossing” means

This is not correct, because in Groisman’s “wakening setup” there is no coin toss at all. It’s simply picking a random element from the set of three. Likewise, in the “coin tossing setup” there are no awakenings or ball pickings. It’s just a random sample of two. Groisman specifically made these setups totally disconnected from each other.

But in the Sleeping Beauty Problem they do appear to be connected! If the coin came Heads, then Heads & Monday Awakening always happens. This is the second issue with the paper. While Groisman is correct that the two events do not necessarily have the same probabilities, he doesn’t present a compelling argument why it would be the case for the Sleeping Beauty Problem.

In Balls in a Box Machine there are multiple coin tosses, which fill the box with lots of different balls, from which later, as a separate procedure, one random ball is picked. But in the Sleeping Beauty, there is only one coin toss that fully determines the awakening routine. Beauty’s current awakening isn’t selected from all awakenings throughout multiple iterations of experiments. If the Beauty goes to sleep on a particular week of the month, she can’t possibly wake up during a previous one.

If we assume that the probability to pick the green ball from the box is the actual question, then halfers will notice that Balls in the Box Machine is not faithful to the Sleeping Beauty set up—that it includes this different randomization procedure. A more accurate version would be something like this:

An automatic device tosses a fair coin, if the coin lands ‘Tails’ the device puts two red balls in a box, if the coin lands ‘Heads’ it puts one green ball in the box. Then it picks a random ball from the box.

But then thirders will demand that there have to be two ball picks on Tails and we are back to square one—argument about whether we are supposed to average per experiment or per awakening. And no deep investigation into the validity of such probabilities was apparently done.

Crux of Disagreement

This analysis, however, wasn’t all in vain. Despite the fact that our adventures with Balls in a Box Machine went full circle, they can help to highlight the crux of disagreement between halfers and thirders.

Notice that initially thirders were not demanding for two balls to be picked on Tails. Why is that? Because there was already random sampling from three states going on. Likewise, before we assumed that it’s the ball picking scheme that matters, halfers were not arguing against applicability of Groisman’s model to the Sleeping Beauty problem. Why is that? Because there was random sampling from two states going on.

Both thirders and halfers can accept the initial formulation of Balls in a Box Machine, because it includes the kind of random sampling that they believe is relevant for the Sleeping Beauty problem. And disagreement between which sampling it is—is our crux. Or, in different words, it’s the disagreement about what “this awakening” means and thus how it should be treated via probability theory.

Thirders believe that this awakening should be treated as randomly sampled from three possible awakening states. Halfers believe that this awakening should be treated as randomly sampled from two possible states, corresponding to the result of a coin toss. This is an objective disagreement, that can be formulated in terms of probability theory and at least one side inevitably has to be in the wrong. This is the unresolved issue that we can’t simply dismiss because both sides have a point.

To solve the Sleeping Beauty paradox is to resolve this disagreement. And that’s what I’m going to do. As a result, we are supposed to get a clear mathematical model for the Sleeping Beauty problem, generalizable to other problems that include memory erasure, rigorous in regards to betting, while also not controverting fundamental principles such as Law of Conservation of Expected Evidence. And, of course, everything should be justified in terms of probability theory, not vague philosophical concepts.

In order to be properly thorough, I’ll have to engage with multiple philosophical papers written on the topic throughout the decades, so this is going to take several posts. The next one will be dedicated to multiple ways people have unsuccessfully tried to model the Sleeping Beauty problem.

The next post in the series is Lessons from Failed Attempts to Model Sleeping Beauty Problem