The highest-probability outcome can be out of distribution

“Out of distribution” is a phenomenon where a data point x does not come from the distribution you had in mind. To give some examples, here is what might be considered OOD:

  • You play a dice game against a cheater, and the cheater uses a die that has been weighted to always roll six. Thus their rolls look like 666666..., whereas your rolls look like 312265....

  • You order an apple from a store and the store decides to drive a truck into your house in order to murder you.

  • You are reading an interesting post on LessWrong, and th2u
    mCXh4:xaapdQA
    r bKv4ztPktiR)pYpXwT-j
    %?:Ta4Or
    XXdicVAQ?;DHJFh?U4wLW! iCmVD~z1iQT5cB4x&J_A-2MmUxCCxDdA3hYXjomA “m5mID;9-C9~=%~-j6ba”2N!fV~jikXZOQ2iEsnNxR0nggVYR:ZF:9&p
    m#R”Xf
    j~=vFiM_N(&~M%8amq2q5z5

Possibilities 2 and 3 seem to be based on something that is unlikely or surprising happening, so they seem to suggest that being “out of distribution” is about being low-probability. But if you think about it for possibility 1, then every sequence of dice rolls of equal length should have the same probability when using fair dice, and therefore low probability cannot account for possibility 1.

In fact, we can construct a scenario where a point is “out of distribution” despite being the highest-probability point in the distribution. Imagine a gaming place where everyone is expected to cheat, playing with dice that have 58% chance of rolling a 6. In that case, an ordinary roll sequence would look like 624646662443.… However, if someone got a roll sequence full of 6s, like 666666666666..., then their rolls would be out of distribution, and it would look like they were cheating beyond the accepted weighted dice.

Conceptually, what is happening here is that even though 666666666666… is the most likely roll sequence, there are so many other sequences of rolls that have fewer 6s that their lower likelihood is counterbalanced (and then some) by their greater number. The sum of the probabilities of other sequences ends up being greater than the probability of 666666666666....

However, this doesn’t explain the full story. After all, the sum of the probabilities of the non-624646662443… roll sequences is also much greater than the probability of 624646662443.… Any one specific roll sequence is unlikely, so why do we consider some to be “out of distribution” and others not?

This is probably not a fully solved problem. Perhaps in a later post, I will give one piece that I believe to be relevant. But until then, here’s a relevant meme:

Image

Thanks to Justis Mills for proofreading and feedback.