You may want to make a link to this post. There were a few different descriptions of why that problem (a very similar one) didn’t work, and the one I pinpointed as “the pointer problem” is more or less that same as the one you pinpointed.
The flaw here is that we conditioned on person x existing, but person x only became of interest after we saw that they existed (peeked at the data).
Perhaps we could call the error a “pre-selection bias”?
(In a separate daughter comment, I’ll summarize the anthropic problem described in the other post, emphasizing the similarity of the problems and the solutions. )
In a nutshell, the simplified problem in the post was this: You have a hotel with green and red rooms, 4 of one color and 1 of another. If you ask an observer at random which case they think it is, on average they will be correct 80% of the time. However, if you ask someone in a green room, they will only be correct 50% of the time.
(Here’s the detailed explanation. Skip if you prefer.)
Suppose you ask a random person. 10 trials would like this on average:
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—red room guys says ‘4 red’ and is incorrect
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—green room guys says ‘4 green’ and is incorrect
On average, the observers are correct 80% of the time because the frequency of a red verses green observer is information about the true distribution.
Suppose you ask a person in a green room. In this case, 10 trials on average would like this:
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
Now, the observers are only correct 50% of the time because their distribution doesn’t reflect the true distribution. You skewed the frequency of green roomers by pre-selecting green.
This was my summary solution to the problem:
When a person is asked to make a prediction based on their subjective observation, they should agree only if they are randomly chosen to be asked independently of their room color. If they were chosen after the assignment, dependent upon having a certain outcome, they should recognize this as [what we might call] pre-selection bias. Your prediction is meaningful only if your prediction could have been either way.
You may want to make a link to this post. There were a few different descriptions of why that problem (a very similar one) didn’t work, and the one I pinpointed as “the pointer problem” is more or less that same as the one you pinpointed.
Perhaps we could call the error a “pre-selection bias”?
(In a separate daughter comment, I’ll summarize the anthropic problem described in the other post, emphasizing the similarity of the problems and the solutions. )
In a nutshell, the simplified problem in the post was this: You have a hotel with green and red rooms, 4 of one color and 1 of another. If you ask an observer at random which case they think it is, on average they will be correct 80% of the time. However, if you ask someone in a green room, they will only be correct 50% of the time.
(Here’s the detailed explanation. Skip if you prefer.)
Suppose you ask a random person. 10 trials would like this on average:
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—red room guys says ‘4 red’ and is incorrect
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—green room guys says ‘4 green’ and is incorrect
On average, the observers are correct 80% of the time because the frequency of a red verses green observer is information about the true distribution.
Suppose you ask a person in a green room. In this case, 10 trials on average would like this:
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
Now, the observers are only correct 50% of the time because their distribution doesn’t reflect the true distribution. You skewed the frequency of green roomers by pre-selecting green.
This was my summary solution to the problem:
Thanks for directing me to that post.
I think calling it ‘pre-selection bias’ makes sense. Would be good to have a name for it, is it’s an error that is common and easy to miss.