# What’s wrong with this picture?

Alice: “I just flipped a coin [large number] times. Here’s the sequence I got:

(Alice presents her sequence.)

Bob: No, you didn’t. The probability of having gotten that particular sequence is 1/​2^[large number]. Which is basically impossible. I don’t believe you.

Alice: But I had to get some sequence or other. You’d make the same claim regardless of what sequence I showed you.

Bob: True. But am I really supposed to believe you that a 1/​2^[large number] event happened, just because you tell me it did, or because you showed me a video of it happening, or even if I watched it happen with my own eyes? My observations are always fallible, and if you make an event improbable enough, why shouldn’t I be skeptical even if I think I observed it?

Alice: Someone usually wins the lottery. Should the person who finds out that their ticket had the winning numbers believe the opposite, because winning is so improbable?

Bob: What’s the difference between finding out you’ve won the lottery and finding out that your neighbor is a 500 year old vampire, or that your house is haunted by real ghosts? All of these events are extremely improbable given what we know of the world.

Alice: There’s improbable, and then there’s impossible. 500 year old vampires and ghosts don’t exist.

Bob: As far as you know. And I bet more people claim to have seen ghosts than have won more than 100 million dollars in the lottery.

Alice: I still think there’s something wrong with your reasoning here.

• The reason why Bob should be much more skeptical when Alice says “I just got HHHHHHHHHHHHHHHHHHHH” than when she says “I just got HTHHTHHTTHTTHTHHHH” is that there are specific other highish-probability hypotheses that explain Alice’s first claim, and there aren’t for her second. (Unless, e.g., it turns out that Alice had previously made a bet with someone else that she would get HTHHTHHTTHTTHTHHHH, at which point we should suddenly get more skeptical again.)

Bob’s perfectly within his rights to be skeptical, of course, and if the number of coin flips is large enough then even a perfectly honest Alice is quite likely to have made at least one error. But he isn’t entitled to say, e.g., that Pr(Alice actually got HTHHTHHTTHTTHTHHHH | Alice said she got HTHHTHHTTHTTHTHHHH) = Pr(Alice actually got HTHHTHHTTHTTHTHHHH) = 2^-20 because Alice’s testimony provides non-negligible evidence, because empirically when people report things they have no particular reason to get wrong they’re quite often right.

(But, again: if Bob learns that Alice had a specific reason to want it thought she got that exact sequence of flips, he should get more skeptical again.)

So, now suppose Alice says “I just won the lottery” and Amanda says “I just saw a ghost”. What should Bob’s probability estimates be in the two cases?

Empirically, so far as I can tell, a good fraction of people who claim to have won the lottery actually did so. Of course people sometimes lie, but you have to weigh “most people don’t win the lottery on any given occasion” against “most people don’t falsely claim to have won the lottery on any given occasion”. I guess Bob’s posterior Pr(Alice won the lottery) should be somewhere in the vicinity of 12. Enough to be decently convinced by a modest amount of further evidence, unless some other hypothesis—e.g., Alice is trying to scam him somehow, or she’s being seriously hoaxed—gets enough evidence to be taken seriously (e.g., Alice, having allegedly won the lottery, asks Bob for a loan to be repaid with exorbitant interest).

On the other hand, there are lots and lots of tales of ghosts and (at best) very few well verified ones. It looks as if many people who claim to have seen ghosts probably haven’t. Further, there are reasons to think it very unlikely that there are ghosts at all (e.g., it seems clear that human thinking is done by human brains, and by definition a ghost’s brain is no longer functioning) and those reasons seem quite robust—they aren’t, e.g., dependent on details of our current theories of quantum physics or evolutionary biology. So we should set Pr(ghosts are real) extremely small, and Pr(Amanda reports a ghost | Amanda hasn’t really seen a ghost) not terribly small, which means Pr(Amanda has seen a ghost | Amanda reports a ghost) is still small.

Bob’s last comparison (claims of seeing ghosts, against actual wins of big lottery prizes) is of course nonsensical, and as long as one of it’s of the form “more claims of ghosts than X” it actually goes the wrong way for his purposes. What he wants is more actual sightings of ghosts and fewer claims of ghosts.

• There’s a nonzero probability that the lottery is a complete scam, and the winners are entirely fictional. (The lottery in 1984 worked like this, but I’m not paranoid enough to believe this is true in real life.)

• a nonzero probability

Sure, but a small enough one that I don’t think it makes much difference to anything here. I might be missing something, though; if you disagree, would you like to say why?

• by definition a ghost’s brain is no longer functioning

There are definitions about ghost brains..? 8-0

• By definition, a ghost is of someone who has (bodily) died. By definition, to be bodily dead means to have a brain that is no longer functioning.

• No love for uploads, I see :-D

• They’ve never yet been known to leave ghosts.

Slightly more seriously: yeah, I agree that human thinking could happen on other substrates besides human brains, but no instances of that have been reported so far, and we don’t know of any plausible mechanism that would make it happen after the brain the thinking started off in has died, and in any case you’re just yanking my chain so I’ll shut up.

• You didn’t even address the elderly vampires. Did you not read the post?

• I don’t know whether Bob’s trolling, but I’m pretty sure you are.

[EDITED to add: I see someone downvoted you; it wasn’t me.]

• What is the probability of someone not reading the post and posting a comment which happens to use exactly the same cca 2500 characters as this?

Assuming at least 4 bits per character, the probability is at most 2^10000. Quite unlikely, if you ask me.

• My response is here, a post on my blog from last August.

Basically when Bob sees Alice present the particular sequence, he is seeing something extremely improbable, namely that she would present that individual sequence. So he is seeing extremely improbable evidence which strongly favors the hypothesis that something extremely improbable occurred. He should update on that evidence by concluding that it probably did occur.

Regarding the lottery issue, we have the same situation. If you play the lottery, see the numbers announced, and go, “I just won the lottery!” you are indeed probably wrong. Look again. In most cases you will see that the numbers don’t quite match. In the few cases where they do match, you are seeing extremely improbable evidence that you won the lottery, namely that your numbers match after repeated comparisons.

• I don’t see the paradox. P(Alice saw this sequence) is low, and P(Alice presented this sequence) is low, but P(Alice saw this sequence | Alice presented this sequence) is high, so Bob has no reason to be incredulous.

• Bob has always been like this.

• I think, but am not certain, that you’re missing the point, by examining Bob’s incredulity rather than the problem as stated. Let’s say your probability that the universe is being simulated is 2^x.

Alice flips a coin (x+1) times. You watch her flip the coins, and she carefully marks down the result of each flip.

No matter what sequence you watch, and she records—that sequence has less likelihood of having occurred naturally than that the universe is simulated, according to your priors. If it helps, imagine that a coin you know to be fair turns up Heads each time. (A sequence of all heads seems particularly unlikely—but every other sequence is equally unlikely.)

• I agree that the probability of seeing that exact sequence is low. Not sure why that’s a problem, though. For any particular random-looking sequence, Bob’s prior P(see this sequence | universe is simulated) is pretty much equal to P(see this sequence | universe is not simulated), so it shouldn’t make Bob update.

• Suppose Alice and Bob are the same person. Alice tosses a coin a large number of times and records the results.

Should she disbelieve what she reads?

• I would personally argue that, even given any particular non-fatal objection to the core of this article, there is something interesting to be found here, if one is charitable. I recommend Chapter 2, Section 4 of Nick Bostrom’s Anthropic Bias: Observation Selection Effects in Science and Philosophy, and the citations therein, for further reading. There also might be more recent work on this problem that I’m unaware of. We might refer to this as defining the distinction between surprising and unsurprising improbable events.

It also seems noteworthy that user:cousin_it has done precisely what Bostrom does in his book: sidesteps the issue by focusing on determining the conditional probability implicit in the problem. (In Bostrom’s case, however, it is P(There exists an ensemble of universes | An observer observes a fine-tuned universe)). Perhaps this concentration on conditional probabilities is a reduction of the problem, but it does not seem to cause the confusion to evaporate in a puff of smoke, as we should probably expect.

It is true that sometimes humans systematically happen upon wrong-headed ideas, but it may also be the case that user:CronoDAS and others have converged upon a substantial problem.

• Substantial? No—it adds up to normality. Interesting? Yes.

Imagine two situations of equal improbability:

In one, Alice flips a coin N times in front of a crowd, and achieves some specific sequence M.

In the other, Alice flips a coin N /​ 2 times in front of a crowd, and achieves some specific sequence Q; she then opens an envelope, and reveals a prediction of exactly the sequence that she just flipped.

These two end results are equally improbable (both end results encode N bits of information—to see this, imagine that the envelope contained a different sequence than she flipped), but we attach significance to one result (appropriately) and not the other. What’s the difference between the two situations?

• I do not think these events are equally improbable (thus, equally probable).

The specific sequence, M, is some sequence in the space of all possible sequences; ”… achieves some specific sequence M” is like saying “there exists an M in the space of all sequences such that N = M.” That will always be true—that is, one can always retroactively say “Alice’s end-result is some specific sequence.”

On the other hand, it’s a totally different thing to say “Alice’s end-result is some specific sequence which she herself picked out before flipping the coin.

• All sequences, both written and flipped, are equally improbable. The difference is in treating the cases where the two sequences are identical as logically distinct from all other possible combinations of sequences. They’re not nearly as distinct as you might think; imagine if she’s off by one. Still pretty improbable, just not -as- improbable. Off by three, a little less probably still. Equivalent using a Caeser Cipher using blocks of 8? Equivalent using a hashing algorithm? Equivalent using a different hashing algorithm?

Which is to say: There is always going to be a relationship that can be found between the predicted sequence and the flipped sequence. Two points make a line, after all.

• I agree with you that the probability of Alice’s sequence being a sequence will always be the same, but the reason Alice’s correct prediction is a difference in the two mentioned situations is because the probability of her randomly guessing correctly is so low—and may indicate something about Alice and her actions (that is, given a complete set of information regarding Alice, the probability of her correctly guessing the sequence of coin flips might be much higher).

Am I misunderstanding the point you’re making w/​ this example?

• Which seems more unlikely: The sequences exactly matching, or the envelope sequence, converted to a number, being exactly 1649271 plus the flipped sequence converted to a number?

• They’re equally likely, but, unless Alice chose 1649271 specifically, I’m not quite sure what that question is supposed to show me, or how it relates to what I mentioned above.

Maybe let me put it this way: We play a dice game; if I roll 3, I win some of your money. If you roll an even number, you win some of my money. Whenever I roll, I roll a 3, always. Do you keep playing (because my chances of rolling 3-3-3-3-3-3 are exactly the same as my chances of rolling 1-3-4-2-5-6, or any other specific 6-numbered sequence) or do you quit?

• Substantial? No—it adds up to normality. Interesting? Yes.

I don’t understand what you mean by this.

Imagine two situations of equal improbability:

In one, Alice flips a coin N times in front of a crowd, and achieves some specific sequence M.

In the other, Alice flips a coin N /​ 2 times in front of a crowd, and achieves some specific sequence Q; she then >opens an envelope, and reveals a prediction of exactly the sequence that she just flipped.

These two end results are equally improbable (both end results encode N bits of information—to see this, imagine that the envelope contained a different sequence than she flipped), but we attach significance to one result (appropriately) and not the other. What’s the difference between the two situations?

It is important to note that to capture this problem entirely we must make it explicit that the person observing the coin flips has not only a distribution over sequences of coin flips, but a distribution over world-models that produce the sequences. It is often implicit, and sometimes explicitly assumed, in coin flipping examples, that a normal human flipping a fair coin is something like our null hypothesis about the world. Most coins seem fair in our everyday experience. Alice correctly predicting the sequence that she achieves is evidence that causes a substantial update on our distribution over world-models, even if the two sequences are assigned equal probability in our distribution over sequences given that the null hypothesis is true.

You can also imagine it as the problem of finding an efficient encoding for sequences of coin flips. If you know that certain subsequences are more likely than others, then you should find a way to encode more probable subsequences with less bits. Actually doing this is equivalent to forming beliefs about the world. (Like ‘The coin is biased in this particular way’, or ‘Alice is clairvoyant.’)

• Alice correctly predicting the sequence that she achieves is evidence that causes a substantial update on our distribution over world-models, even if the two sequences are assigned equal probability in our distribution over sequences given that the null hypothesis is true.

Except that we’re not updating all distributions of all possible world-models, or every single sequence would be equally surprising. You’re implicitly looking for evidence that, say, Alice is clairvoyant—you’ve elevated that hypothesis to your awareness before you ever looked at the evidence.

• Except that we’re not updating all distributions of all possible world-models, or every single sequence would be equally surprising.

If you don’t even know what you mean by surprise (because that’s what we’re ostensibly trying to figure out, right?), then how can you use the math to deduce that some quantitative measure of surprise is equal in all cases?

I still think this is just a confusion over having a distribution over sequences of coin flips as opposed to a distribution over world-models.

Suppose you have a prior distribution over a space of hypotheses or world-models M, and denote a member of this space as M’. Given data D, you can update using Bayes’ Theorem and obtain a posterior distribution. We can quantify the difference between the prior and posterior using the Kullback-Leibler divergence and use it as a measure of Bayesian surprise. To see how one thing with the same information content as another thing can be more or less surprising, imagine that we have an agent using this framework set in front of a television screen broadcasting white noise. The information content of each frame is very high because there are so many equally likely patterns of noise, but the agent will quickly stop being surprised because it will settle on a world-model that predicts random noise, and the difference between its priors and posteriors over world-models will become very small.

If in the future we want to keep using a coin flip example, I suggest forgetting things that are so mind-like as ‘Alice is clairvoyant’, and maybe just talk about biased and unbiased coins. It seems like an unnecessary complication.

• If you don’t even know what you mean by surprise (because that’s what we’re ostensibly trying to figure out, right?), then how can you use the math to deduce that some quantitative measure of surprise is equal in all cases?

Because the number of bits of information is the same in all cases. Any given random sequence provides evidence of countless extremely low probability world models—we just don’t consider the vast majority of those world-models because they aren’t elevated to our attention.

If in the future we want to keep using a coin flip example, I suggest forgetting things that are so mind-like as ‘Alice is clairvoyant’, and maybe just talk about biased and unbiased coins. It seems like an unnecessary complication.

It’s both necessary and relevant. Indeed, I crafted my example to make your brain come up with that answer. Your conscious mind, once aware of it, probably immediately threw it into the “Silly explanation” column, and I’d hazard a guess that if asked, you’d say you wrote it down as a joke.

Because it clearly isn’t an example of a world-model being allocated evidence. Your explanation is post-hoc—that is, you’re rationalizing. Your description would be an elegant mathematical explanation—I just don’t think it’s correct, as pertains to what your mind is actually doing, and why you find some situations more surprising than others.

• Because the number of bits of information is the same in all cases.

I don’t know why you’re using self-information/​surprisal interchangeably with surprise. It’s confusing.

Any given random sequence provides evidence of countless extremely low probability world models—we just don’t consider the vast majority of those world-models because they aren’t elevated to our attention.

Like in the sense that there are hypotheses that something omniscient would consider more likely conditional on Alice doing something surprising, that humans just don’t think of because they’re humans? I don’t expect problems coming up with a satisfactory description of ‘the space of all world-models’ to be something we have to fix before we can say anything important about surprise.

Because it clearly isn’t an example of a world-model being allocated evidence. Your explanation is post-hoc—that is, you’re rationalizing. Your description would be an elegant mathematical explanation—I just don’t think it’s correct, as pertains to what your mind is actually doing, and why you find some situations more surprising than others.

Maybe there’s more to be said about the entire class of things that humans have ever labeled as surprising, but this does capture something of what humans mean by surprise, and we can say with particular certainty that it captures what happens in a human mind when a visual stimulus is describable as ‘surprising.’ The framework I described has, to my knowledge, been shown to correspond quite closely to our neuroscientific understanding of visual surprise and has been applied in machine learning algorithms that diagnose patients based on diagnostic images. There are algorithms that register seeing a tumor on a CT scan as ‘surprising’ in a way that is quite likely to be very similar to the way that a human would see that tumor and feel surprised. (I don’t mean that it’s similar in a phenomenological sense. I’m not suggesting that these algorithms have subjective experiences.) I expect this notion of surprise to be generalizable.

• I expect this notion of surprise to be generalizable.

Which is what I’m trying to get at. There’s -something- there, more than “amount of updates to world-models”. I’d guess what we call surprise has a complex relationship with the amount of updates applied to world-models, such that a large update to a single world-model is more surprising than an equal “amount” of update applied across one thousand.

• Substantial? No—it adds up to normality. Interesting? Yes.

I don’t understand what you mean by this.

Imagine two situations of equal improbability:

In one, Alice flips a coin N times in front of a crowd, and achieves some specific sequence M.

In the other, Alice flips a coin N /​ 2 times in front of a crowd, and achieves some specific sequence Q; she then >opens an envelope, and reveals a prediction of exactly the sequence that she just flipped.

These two end results are equally improbable (both end results encode N bits of information—to see this, imagine that the envelope contained a different sequence than she flipped), but we attach significance to one result (appropriately) and not the other. What’s the difference between the two situations?

It is important to note that to capture this problem entirely we must make it explicit that the person observing the coin flips has not only a prior over sequences of coin flips, but a prior over world-models that produce the sequences. It is implicit, and often explicitly assumed, in any coin flip example, that a normal human flipping a fair coin is something like our null hypothesis. Most coins seem fair in our everyday experience. Alice correctly predicting the sequence that she achieves is evidence that causes a substantial update on our distribution over world models, even if the two sequences are assigned equal probability in our distribution over sequences given that we consider the null hypothesis most likely.

You can also imagine it as the problem of finding an efficient encoding of sequences of coin flips. If you know that certain subsequences are more likely than others, then you should find a way to encode more probable subsequences with less bits. Actually doing this is equivalent to forming beliefs about the world. (Like ‘The coin is biased in this particular way’, or ‘Alice is clairvoyant.’)

• I have seen this argument on LessWrong before, and don’t think the other explanations are as clear as they can be. They are correct though, so my apologies if this just clutters up the thread.

The Bayesian way of looking at this is clear: the prior probability of any particular sequence is 1/​2^[large number]. Alice sees this sequence and reports it to Bob. Presumably Alice intends on telling Bob the truth about what she saw, so let’s say that there’s a 90% chance that she will not make a mistake during the reporting. The other 10% will cover all cases ranging from misremembering/​misreading a flip to outright lying. The point is that if Alice is lying, this 10% has to be divided up between the other 2^[large number]-1 other possible sequences—if Alice is going to lie, any particular sequence is very unlikely to be presented by her as the true sequence, since there are a lot of ways for her to lie. So, assuming that Alice was intending to speak the truth, her giving that sequence is very strong (in my example 9*(2^[large number]-1):1) evidence that that particular sequence was indeed the true one over any specific other sequence - ‘coincidentally’ precisely strong enough to turn the posterior belief of Bob that that sequence is correct to 90%.

A fun side remark is that the above also clearly shows why Bob should be more skeptical when Alice presents sequences like HHHHHHHHHH or HTHTHTHTHTHT—if Alice were planning on lying these are exactly the sequences that she might pick with a greater than uniform probabilty out of all the sequences that were not thrown, and therefore each possible actual sequence contributes a higher-than-average amount of probability that Alice would present one of these special sequences, so the fact that Alice informs Bob of such a sequence is weaker evidence for this particular sequence over any other one than it would be in the regular case, and Bob ends up with a lower posterior that the sequence is actually correct.

• An analogous question that I encountered recently when buying a powerball lottery ticket just for the heck of it (also because its jackpot was \$1.5 billion and the expected value of buying a ticket was actually approaching a positive net reward) :

I was in a rush to get somewhere when I was buying the ticket, so I thought, “instead of trying to pick meaningful numbers, why not just pick something like 1-1-1-1-1-1? Why would that drawing be strictly more improbable than any other random permutations of 6 numbers from 1 to 60, such as 5-23-23-16-37-2? But then the store clerk told me that I could just let the computer pick the numbers on my ticket, so I said “OK.”

Picking 1-1-1-1-1-1 SEEMS like you are screwing yourself over and requiring an even more improbable outcome to take place in order to win...but are you REALLY? I don’t see how....

I’m sure if 1-1-1-1-1-1 were actually drawn, there would be investigations about whether that drawing was rigged. And if I won with ANY ticket (such as 5-23-23-16-37-2), I would start to wonder whether I was living in a simulation centered around my life experience. But aren’t these intuitions going astray? Aren’t the probabilities all the same?

• The probabilities are all the same. But you are probably screwing yourself over (above and beyond the screwage of buying a ticket in the first place, at least if wealth is your goal) if you pick 1,2,3,4,5,6 or something of the kind—because more other people will have picked that than 1,4,5,18,23,31 or some other random-looking set, so if you win you’ll have to share the prize with more people. (Assuming that that’s what happens when there are multiple jackpot winners. It usually is.)

• Nitpick: Balls are drawn without replacement in the Powerball lottery, so 1-1-1-1-1-1 is not a possible winning combination. 1-2-3-4-5-6 is, though.

• I treated the ticket as an experiment into the question of whether or not I’m living in a simulation, treating it as weak evidence against an already weak hypothesis.

• It seems to me that either Alice is lying or she is telling the truth. The actual amount of possible lies at her disposal is pretty irrelevant to the question of whether she is lying or not.

• For any n coin flips p(sequence) = 1/​2^n right?

for 100 coin flips, p(sequence) as a result is 1/​2^100 = 7.8886091e-31

you have observed an event that could have gone 2^100 different ways, and found one version of the result. just because you have done something with a specific probability doesn’t mean it’s a low probability.

The probability of getting a sequence is (pretty much) 1 (given that flipping 100 coins in a thought experiment is pretty safe)

The probability of getting that sequence again is quite low.

• The probability of all heads is the same as the probability of any other sequence of flips. You’d feel somewhat differently about flipping heads one hundred times in a row than most other distributions, however, even though it’s just as likely as any other distribution of flips.

• Let’s divide possible sequences into two broad classes: Distinguished, and undistinguished. Distinguished sequences are those which, for example, are predicted in advance of the coin flips; they have a property which sets them apart from the undistinguished sequences. Undistinguished sequences are all sequences which are isomorphic with respect to the rest of the universe.

All heads is a naturally distinguished sequence; all tails, likewise. Repeating patterns in general. Likewise simple ASCII encodings of binary messages (“This is God, quit flipping that coin”).

Once you notice that every sequence can be distinguished somehow (some more readily or naturally than others; for any given input, there’s a function that will turn that input into any given output), and that all sequences belong to the “distinguished” set, the odds of getting an interesting-by-some-criteria sequence become 1. Therefore, the mere fact of an interesting quality of a sequence isn’t, in itself, interesting, but rather the class of qualities that interesting quality belongs to.

All heads is equivalent to all tails. HTHTHT, repeated as necessary, is equivalent to THTHTH, is lesser interestingness than HHHHHH and TTTTTT. HHTHHTHHT, TTHTTHTTH, HTHHTHHTH, THTTHTTHT, HHHHHHHHH, TTTTTTTTT.

You’ll notice that all-heads belongs to multiple sets of interesting qualities; repeated single digit, repeated digit pairs, repeated digit triplets.

There’s probably some mathematical language that could be used to describe the “interestingness” of a sequence (I imagine the complexity of the property counts against it?), but I am simply ignorant of it. Depending on the interestingness of the actual sequence Alice shows to Bob, the odds of something of that level of interest occurring should be computable, and may or may not be more likely than some kind of deception or bizarre behavior on the part of the universe.

• I think the actual property you’re looking for is something like “probability that Alice would produce this sequence by means other than honest coin-flipping accurately reported”. All-H is higher probability than something random-looking because there are more-likely scenarios where Alice has a motive for reporting it falsely.

If Alice were a randomly chosen computer program or something, then what you’re asking for would be more or less a Solomonoff prior, which is quite popular around these parts, but real-world Alices probably have different patterns of likely deception.

• Well, Alice -could- produce a random sequence, then crunch algorithms until she finds an encoding methodology that returns a desired output string from the given random sequence, or runs standard encoding methodologies until one returns an interesting-looking (i/​e, seemingly non-random) result, such as the aforementioned ASCII binary encoding for “This is God, quit flipping that coin.”

Which is to say, assuming somebody produced a sequence of coin flips, and presented a binary encoding, given some arbitary encoding schema, that produced the result “This is God, quit flipping that coin.”, there should be a way of determining how nonrandom the result actually is. If we took a sequence and decoded it as ASCII, and it produced that phrase, that would be considerably stronger evidence than if we took a sequence, ran it through a database of character encodings, and discovered that it produced that string after we ran it through 7zip, inserted a 1 every third bit and a 0 every eleven bits, and Base64 decoded the results.

The Solomonoff prior of a given random sequence should be, on average, the same as the odds of getting a given item in that sequence if it were genuinely random (since any probability assigned to cheating is unassigned from random, and there are the same number of items overall). The odds of getting -any- sequence of length N, given you flip a coin N times, is approximately 1, less situations where the coin spontaneously stops existing or whatnot. A Solomonoff prior for a truly random number doesn’t resolve the dilemma that a given apparently random sequence is an incredibly unlikely outcome, and, by naive analysis, it’s -always- more likely that Alice cheated than that this particular sequence came up by chance.

Which is to say, I’m trying to distinguish between the probability of an event occurring and whether or not that event occurring is actually surprising. Which is probably related to Solomonoff complexity, as it seems like it should be proportional to its inverse.

• by naive analysis, it’s -always- more likely that Alice cheated than that this particular sequence came up by chance.

Why? I agree that Pr(Alice cheated) is likely higher than Pr(these coin-flip results on this occasion), but that’s the wrong comparison. Pr(Alice cheated to produce these coin-flip results) is typically about 2^-n times Pr(Alice cheated), and in particular is generally smaller than Pr(Alice got these coin-flip results fairly).

• Wait. Are you arguing that, given two possibilities: Alice cheated to produce (random sequence), and Alice produced (random sequence) randomly, given that it requires the same amount of information to produce the sequence in both cases (n bits), Alice cheating to produce a given sequence is just as unlikely, for a sufficiently random sequence, as arriving at the random sequence randomly?

• Pretty much, yes. (Not necessarily exactly equally unlikely—human cheaters and perfect unbiased uncorrelated coin flippers don’t produce the same output. But if you’ve got some arbitrary 100-long sequence of coin flips, you don’t get to say “Alice must have cheated because that sequence is unlikely by chance”; it’s unlikely by cheating too for the exact same reason.)

• Ok. I think part of the issue [ETA: with our mutual understanding of each other, not with you] is that you’re focused on the “You’re lying” part of the conversation.

I’m considering it in the context of this: “My observations are always fallible, and if you make an event improbable enough, why shouldn’t I be skeptical even if I think I observed it?”

Granted, his observations have N bits of information (at least), the same as the situation with cheating, and it’s at least as improbable that he’d observe a given sequence of length N when something else entirely happened, than that the given sequence of length N itself happened, so in practice, it’s still -certainly- more likely that he actually observed the observation he observed.

The paradox isn’t there. The paradox is that we would, in fact, find some sequences unbelievable, even though they’re exactly as likely as every other sequence. If the sequence was all heads 100 times in a row, for instance, that would be unbelievable, even though a sequence of pure heads is exactly as likely as any other sequence.

The paradox is in the fact that the sequence is undefined, and for some sequences, we’d be inclined to side with Alice, and for other sequences, we’d be inclined to side with Bob, even though all possible sequences of the same length are equally likely.

ETA:

This is what I was getting at with the difference between the reference classes of “distinguished” and “undistinguished”.

• if you make an event improbable enough, why shouldn’t I be skeptical even if I think I observed it?

You should. You should be aware that you might e.g. have made a mistake and slightly misremembered (or miscopied, etc.) the results of the coin flips, for instance.

we would, in fact, find some sequences unbelievable

We might say that. We might even think it. But what we ought to mean is that we find other explanations more plausible than chance in those cases. If you flip a coin 100 times and get random-looking results: sure, those particular results are very improbable, but very improbable things happen all the time (as in fact you can demonstrate by flipping a coin 100 times). What you should generally be looking at is not probabilities but odds. That random-looking sequence is neither much more nor much less likely than any other random-looking sequence of 100 coin-flips, so the fact that it’s improbable doesn’t give you reason to disbelieve it—you don’t have a better rival hypothesis. But if you flip all heads, suddenly there are higher-probability alternatives. Not because all-heads is especially unlikely by chance, but because it’s especially likely by not-chance. Maybe the coin is double-headed. Maybe it’s weighted in some clever way[1]. Maybe you’re hallucinating or dreaming. Maybe some god is having a laugh. All these things are (so at least it seems) much more likely to produce all-heads than a random-looking sequence.

[1] I think I recall seeing an analysis somewhere that found that actually weighting a coin can’t bias its results much.

• But if you flip all heads, suddenly there are higher-probability alternatives. Not because all-heads is especially unlikely by chance, but because it’s especially likely by not-chance. Maybe the coin is double-headed. Maybe it’s weighted in some clever way[1]. Maybe you’re hallucinating or dreaming. Maybe some god is having a laugh. All these things are (so at least it seems) much more likely to produce all-heads than a random-looking sequence.

Which is, I think, what is interesting about this: All-heads is no more improbable than any other random sequence, but in the case of an all-heads sequence, suddenly we start looking for laughing gods, hallucinations, or dreams as an explanation.

Which is to say, the interesting thing here is that we’d start looking for explanations of an all-heads sequence, even though it’s no more improbable than any other sequence.

• No—not “suddenly we start looking for”. Suddenly those are better explanations than if the sequence of coin flips had been random-looking.

• Like gods having a laugh?

You didn’t, and wouldn’t, leap into the better explanations. You leapt fully into any explanation except chance, without regard for whether or not it was a better explanation.

Gods having a laugh aren’t something you even think of if you aren’t looking for an explanation.

• Gods having a laugh are a pretty terrible explanation for anything, and their inclusion here was mostly gjm having a laugh.

The borderline between “suddenly we start looking for a better explanation” and “suddenly better explanations start occurring to us” is an extremely fuzzy one. My reason for preferring the latter framing is that what’s changed isn’t that randomness has become worse at explaining our observations, but that some non-random explanation has got better.

• One is a very good mathematical explanation.

The other is why “Gods having a laugh” would actually cross your mind. You include that as a joke because it rings true.

• One is a very good mathematical explanation. The other is [...]

My apologies for being dim: what are “one” and “the other” here?

because it rings true

How do you know why I did it? (I say: you don’t know why I did it, you’re just pretending to. That’s as rude as it is foolish.)

• My apologies for being dim: what are “one” and “the other” here?

Suddenly looking for explanations, versus explanations suddenly begin occurring to us.

How do you know why I did it? (I say: you don’t know why I did it, you’re just pretending to. That’s as rude as it is foolish.)

Because of how humor works. It depends upon a shared/​common experience. You not only expect to think of gods laughing at you, in that situation—because you’ve thought of exactly that in similar weird circumstances in your life—you expect me to think of gods laughing at me, in that situation. (And gods laughing at me would, in fact, be something I considered given a long-enough sequence of all-Heads, so the joke didn’t fall flat. I’ve thought of some equivalent of gods laughing at me for far less unusual coincidences, after all.)

I didn’t need you to tell me it was a joke, however. I knew that explanation would occur to you in the real world before you ever mentioned it—because 100 heads in a row would be, quite simply, unbelievable, and any sane person would be questioning -everything- in lieu of believing it happened by chance—even though any other random sequence is just as unlikely. It’s just how our brains work.

• Suddenly looking for explanations, versus explanations suddenly begin occurring to us.

OK, that’s what I first thought. But then I can’t make sense of what you say about these: “One is a very good mathematical explanation” and “the other is why ‘Gods having a laugh’ would actually cross your mind”. From the “actually” in the second, it seems as if you’re endorsing that one, in which case presumably “a very good mathematical explanation” is intended as a criticism. Which doesn’t make any sense to me.

How do you know why I did it?

Because of how humor works.

But your analysis on the basis of “how humor works” doesn’t give any reason at all for any preference between “suddenly start looking for explanations” and “explanations start occurring to us”. It hinges only on the fact that, one way or another, many people in such a weird situation would start considering hypotheses like “gods messing with us” even if they have previously been very sure that no gods exist.

any sane person would be questioning -everything- in lieu of believing it happened by chance

That may very well be correct. But in so far as we frame that as “I don’t believe X happened because its probability is very low”, all that indicates is that we intuitively think about probabilities (or at least express our thinking about probabilities) wrongly. The thing that triggers such thoughts is the occurrence of a low-probability event that feels like it should have a better explanation, even if the thought we’re then inclined to think doesn’t explicitly have that last bit in it.

(It’s not necessarily that concrete better explanations occur to us. It’s that we have a heuristic that tells us there should be one. What I wrote before kinda equivocates between those, for which I apologize; I am not sure which I had in mind, but what I endorse after further thought is the latter, together with the observation that what makes this heuristic useful is the fact that its telling us “there should be a better explanation” correlates with there actually being one.)

• From the “actually” in the second, it seems as if you’re endorsing that one, in which case presumably “a very good mathematical explanation” is intended as a criticism. Which doesn’t make any sense to me.

I was implying that it is a rationalization. Perhaps a fair one—I have no ready counterargument available—but not the real reason for the behavior.

It’s not necessarily that concrete better explanations occur to us. It’s that we have a heuristic that tells us there should be one.

Yes! Exactly. And moreover—that heuristic is, as you say, useful. What is the heuristic measuring, and why?

Skipping ahead a bit: The ability to notice which improbable things require explanations is, perhaps, the heart of scientific progress (think of data mining—why can’t we just run a data mining rig and discover the fundamental equations of the universe? I’d bet all the necessary data already exists to improve our understanding of reality by as much again as the difference between Newtonian and Relativistic understandings of reality). Why does it work, and how can we make it work better?

• the interesting thing here is that we’d start looking for explanations of an all-heads sequence, even though it’s no more improbable than any other sequence.

It’s no more probable under the null hypothesis, but much more probable under more probable than average alternative hypotheses.

• It’s no more probable under the null hypothesis, but much more probable under more probable than average alternative hypotheses.

Such as gods interfering with our lives?

Imagine, for a moment, you’ve ruled out all of the probable explanations. Are you still going to be looking for an alternative explanation, or will you accept that it’s chance?

• Or the coin being cheat, or some cheating or “non-random” effect in the situation. Delusional recollection of events.

How did I “rule out” the alternatives? When I imagine me doing that, I imagine me reasoning poorly. I go by Jaynes’ policy of having a catch all “something I don’t understand” hypothesis for multiple hypothesis testing. In this case, it would be “some agent action I can’t detect or don’t understand the mechanism of”. How did I rule that out?

Suppose it’s 1,000,000 coin flips, all heads. The probability of that is pretty damn low, and much much lower than my estimates for the alternatives, including the “something else” hypothesis. You can make some of that up with a sampling argument about all the “coin flip alternatives” one sees in a day, but that only takes you so far.

I don’t see how I would ever be confident that 1,000,000 came up all heads with “fair” coin flipping.

• It’s a fair coin. It just has two heads on it.

• The probability of that is pretty damn low

The probability of any specific sequence of 1M coin flips is “pretty damn low” in the same sense. The relevant thing here is not that that probability is low when they’re all heads, but that the probability of some varieties of “something else” is very large, relative to that low probability. Or, more precisely, what sets us thinking of “something else” hypotheses is some (unknown) heuristic that tells us that it looks like the probability of “something else” should be much bigger than the probability of chance.

(I guess the heuristic looks for excessive predictability. As a special case it will tend to notice things like regular repetition and copies of other sequences you’re familiar with.)

• It is not true that overall all sequences are equally likely. The probability of a certain sequence is the probability that it would happen by chance added to the probability that it would happen by not-chance. As gjm said in his comment, the chance part is equal, but the non-chance part is not. So there is no reason why the total probability of all sequences would be equal. The total probability of a sequence of 100 heads is higher than most other sequences. For example, there is the non-chance method of just talking about a sequence without actually getting it. We’re doing that now, and note that we’re talking about the sequence of all heads. That was far more likely given this method of choosing a sequence, then an individual random looking sequence.

(But you are right that it is no more improbable than other sequences. It is less improbable overall, and that is precisely why we start looking for another explanation.)

• No, that’s a very good reason to start looking for another explanation, but somebody with no understanding of Bayes’ Rule at all would do exactly the same thing. If somebody else would engage in exactly the same behavior with a radically different explanation for that behavior, given a particular stimulus—consider the possibility that your explanation for your behavior is not the real reason for your behavior.

• I agree that Pr(Alice cheated) is likely higher than Pr(these coin-flip results on this occasion), but that’s the wrong comparison.

Why?

Pr(Alice cheated to produce these coin-flip results) is typically about 2^-n times Pr(Alice cheated), and in particular is generally smaller than Pr(Alice got these coin-flip results fairly).

“Typically” and “Generally” are doing all the heavy lifting there. Imagine writing an AI to guess the probability that Alice cheated, given a sequence. What rules would you apply?

• Why?

Because if you write down the Bayes’ Rule calculation, that’s not the ratio that appears in it.

“Typically” and “Generally” are doing all the beavy lifting there.

Nope. They both mean: for large n, for a fraction of sequences that tends to 1 as n → infinity, that’s what happens.

• Because if you write down the Bayes’ Rule calculation, that’s not the ratio that appears in it.

HTTTHHHTHTHTTHHTTHTHTTHTHHHTHHTTTHTH. Using Bayes’ Rule, what are the odds I actually got that sequence, as opposed to randomly typing letters? (If you miss my point: You’re misusing Bayes’ Rule in this argument.)

Nope. They both mean: for large n, for a fraction of sequences that tends to 1 as n → infinity, that’s what happens.

If Alice cheats 100% of the time, your formula produces probabilities greater than 1 for any n less than infinity, which I’m reasonably certain doesn’t happen.

• Using Bayes’ Rule, what are the odds I actually got that sequence, as opposed to randomly typing letters?

Pretending for the sake of argument that I don’t see any regularities in your sequence that I wouldn’t expect from genuinely random coin flips (it actually looks to me more human-generated, but with only n=36 I’m not very confident of that): the odds are pretty much the same as the prior odds that you’d actually flip a coin 36 times rather than just writing down random-looking Hs and Ts.

You’re misusing Bayes’ Rule in this argument.

I think you may be misunderstanding my argument.

If Alice cheats 100% of the time, your formula produces probabilities greater than 1

The only formula I wrote down was “2^-n times Pr(Alice cheated)” and those probabilities are definitely not greater than 1. Would you care to be more explicit?

• Pretending for the sake of argument that I don’t see any regularities in your sequence that I wouldn’t expect from genuinely random coin flips (it actually looks to me more human-generated, but with only n=36 I’m not very confident of that): the odds are pretty much the same as the prior odds that you’d actually flip a coin 36 times rather than just writing down random-looking Hs and Ts.

You said something interesting there, and then skipped right past it. That’s the substance of the question. You don’t get to ignore those regularities; they do, in fact, affect the probabilities. Saying that they don’t appear in the ratio of Bayes’ Rule is… well, misusing Bayes’ Rule to discard meaningful evidence.

The only formula I wrote down was “2^-n times Pr(Alice cheated)” and those probabilities are definitely not greater than 1. Would you care to be more explicit?

2^(-n) approaches 1 as n approaches infinity, but for any finite n, is greater than 1. Multiply that by a probability of 1, and you get a probability greater than 1. [ETA: Gyah. It’s been too long since I’ve done exponents (literally, ten years since I’ve done anything interesting). You’re right, I’m confusing negative exponents with division in exponents.]

• Saying that they don’t occur in the ratio of Bayes’ Rule [...]

But I didn’t say that. I didn’t say anything even slightly like that.

This is at least partly my fault because I was too lazy to write everything out explicitly. Let me do so now; perhaps it will clarify. Suppose X is some long random-looking sequence of n heads and tails.

Odds(Alice cheated : Alice flipped honestly | result was X) = Odds(Alice cheated : Alice flipped honestly) . Odds(result was X | Alice cheated : Alice flipped honestly).

The second factor on the RHS is, switching from my eccentric but hopefully clear notation to actual probability ratios, Pr(result was X | Alice cheated) /​ Pr(result was X | Alice flipped honestly).

So those two probabilities are the ones you have to look at, not Pr(Alice cheated) and Pr(result was X | Alice flipped honestly). But the latter is what you were comparing when you wrote

it’s -always- more likely that Alice cheated than that this particular sequence came up by chance.

which is why I said that was the wrong comparison.

• After a little more research, the issue appears to be related to Bonferroni inequalities (where we treat every single possible sequence, or explanation for an unusual sequence, as a hypothesis, on a data set of one), and although this is far outside my expertise in statistics, I suspect something -like- a Bonferroni correction might resolve the paradox.

• The probability of getting some head/​tails sequence is near 1 (cuz it could land on it’s edge). The probability of predicting said sequence beforehand is extremely low.

The probability of someone winning the lottery is X, where X = the % of the possible ticket combinations sold. The probability of you winning the lottery with a particular set of numbers is extremely low.

As far as we can tell, and with the exception of the Old Testament heros, the probability of someone living to be 500 years old is much lower than winning most lotteries or predicting a certain high number of coin flips, though I suppose a smart ass could devise some exceptions to either. We’d have to better define “vampire” to arrive on a probability for that bit.

A house being haunted by real ghosts is actually extremely probable, depending on the neighborhood.

• This is the explanation closest to what I was thinking beforehand. The problem seems like one of the difference between {the difficulty of predicting an event} and {the likelihood of correctly reporting an observed event}. I think Dagon’s argument about Map vs. Territory is a good one too.

Question for you, though… please define “ghost”? I have a feeling your definition is different than mine because I find events such as

certain environmental factors like (low-level poisoning from radon, carbon monoxide, et al; certain acoustic effects; certain architectural events such as uneven expansion due to temperature changes; &c.) cause minor hallucinations or illusions resulting in supersocial minds like those in humans perceiving “people” where there are none

very much more likely than

something of a person that exists independent of the usual corporeal form and (typically) despite the loss of that form is detectable by an uninformed and objective observer.

• I was joking about ghosts… :)

It’s a good point, though. I think minor hallucinations and illusions area much more probable explanation for ghosts—and lots of other alleged paranormal/​supernatural phenomena—than anything authentic.

• two ways to approach this, depending on which direction Bob takes the argument.

1) Alice seems to be accepting Bob’s implication that probability exists in reality, rather than only in the minds and models we use to predict it

A fair bit of recent discussion can be found in this thread, and in the sequences in Probability is in the Mind.

I’d summarize as “the probability of something that happened is 1”. There’s no variance in whether or not it will occur. If you like, you can add uncertainty about whether you know it happened, but a lot of things approach 1 pretty quickly.

The probability of future experiences will be 1 or 0 at some point, but for now, different agents assign likelihoods based on their priors. The probably that the next sequence of flips will be exactly this is, in the world, 0 or 1 - what will be will be. The probability that an agent can use to predict this experience is 1/​2^n. That probability expectation changes as evidence is added, such as the evidence of seeing some or all of the flips.

alternate way of showing this: ask Bob to show his update after each of the N flips. The prior is indeed 1/​2^n, but each observation makes it twice as likely.

2) Bob’s just wrong if his priors give a significantly higher chance to supernatural physics violations than to winning the lottery. This is probably some form of scope insensitivity. For both ghosts and lottery-winning, I do assign a much higher chance that I’m being tricked than that it actually happened, but if the question comes up, I can gather evidence to change those ratios.

• Bob is lumping all events of low probability in the same category, without distinguishing between “not too likely, but still could happen someday” and “ridiculously unlikely, why are you even considering it” events.

• The issue here is not different degrees of probability, but the probability of the testimony.

• Nothing is wrong with this picture—it’s just Bob trolling Alice :-)

• Alice should avoid at all costs being drawn onto Bob’s turf. There are several ways to avoid this.

• I don’t know about that. I think Alice should just troll him right back.

For example, she might consider more… interesting clothing and better makeup X-)

• People need this

https://​​en.wikipedia.org/​​wiki/​​Axiom

in order to avoid “paralysis” and go further.