My guess is that when people are pointing to “OOD” stuff in the real-world, it’s really a pointer to “things unlikely enough that another hypothesis is more probable”. A series of 66666… in the last scenario you describe is more probable than any other series of dice rolls, but when you compare it in terms of P(outcome that benefits player) and P_prior(player is cheating), it seems like the reaction you describe as “calling OOD” is from the former belonging to a class unlikely enough that the posterior of the latter is beyond some threshold that allows calling out cheaters without being too sensitive. The generalization to other settings would then probably be something like “I suspect something else is at play here” when I see “out of distribution”.
I agree that there is a lot of truth to that viewpoint. However, this is difficult to formalize with learning. The issue is that most existing learning theory assumes that you already have data on the full set of classes, but if you do that, then everything becomes in-distribution.
I think that for practical purposes, what often matters is that this other process generates data points with very different downstream effects. This matches what you bring up with P(outcome that benefits the player); the downstream effects of 666666… are very different from most other dice roll sequences.
My guess is that when people are pointing to “OOD” stuff in the real-world, it’s really a pointer to “things unlikely enough that another hypothesis is more probable”. A series of 66666… in the last scenario you describe is more probable than any other series of dice rolls, but when you compare it in terms of P(outcome that benefits player) and P_prior(player is cheating), it seems like the reaction you describe as “calling OOD” is from the former belonging to a class unlikely enough that the posterior of the latter is beyond some threshold that allows calling out cheaters without being too sensitive. The generalization to other settings would then probably be something like “I suspect something else is at play here” when I see “out of distribution”.
I agree that there is a lot of truth to that viewpoint. However, this is difficult to formalize with learning. The issue is that most existing learning theory assumes that you already have data on the full set of classes, but if you do that, then everything becomes in-distribution.
I think that for practical purposes, what often matters is that this other process generates data points with very different downstream effects. This matches what you bring up with P(outcome that benefits the player); the downstream effects of 666666… are very different from most other dice roll sequences.