I agree that there is a lot of truth to that viewpoint. However, this is difficult to formalize with learning. The issue is that most existing learning theory assumes that you already have data on the full set of classes, but if you do that, then everything becomes in-distribution.
I think that for practical purposes, what often matters is that this other process generates data points with very different downstream effects. This matches what you bring up with P(outcome that benefits the player); the downstream effects of 666666… are very different from most other dice roll sequences.
I agree that there is a lot of truth to that viewpoint. However, this is difficult to formalize with learning. The issue is that most existing learning theory assumes that you already have data on the full set of classes, but if you do that, then everything becomes in-distribution.
I think that for practical purposes, what often matters is that this other process generates data points with very different downstream effects. This matches what you bring up with P(outcome that benefits the player); the downstream effects of 666666… are very different from most other dice roll sequences.