In a nutshell, the simplified problem in the post was this: You have a hotel with green and red rooms, 4 of one color and 1 of another. If you ask an observer at random which case they think it is, on average they will be correct 80% of the time. However, if you ask someone in a green room, they will only be correct 50% of the time.
(Here’s the detailed explanation. Skip if you prefer.)
Suppose you ask a random person. 10 trials would like this on average:
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—red room guys says ‘4 red’ and is incorrect
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—green room guys says ‘4 green’ and is incorrect
On average, the observers are correct 80% of the time because the frequency of a red verses green observer is information about the true distribution.
Suppose you ask a person in a green room. In this case, 10 trials on average would like this:
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
Now, the observers are only correct 50% of the time because their distribution doesn’t reflect the true distribution. You skewed the frequency of green roomers by pre-selecting green.
This was my summary solution to the problem:
When a person is asked to make a prediction based on their subjective observation, they should agree only if they are randomly chosen to be asked independently of their room color. If they were chosen after the assignment, dependent upon having a certain outcome, they should recognize this as [what we might call] pre-selection bias. Your prediction is meaningful only if your prediction could have been either way.
In a nutshell, the simplified problem in the post was this: You have a hotel with green and red rooms, 4 of one color and 1 of another. If you ask an observer at random which case they think it is, on average they will be correct 80% of the time. However, if you ask someone in a green room, they will only be correct 50% of the time.
(Here’s the detailed explanation. Skip if you prefer.)
Suppose you ask a random person. 10 trials would like this on average:
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—red room guys says ‘4 red’ and is incorrect
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—red room guys says ‘4 red’ and is correct
RRRRG—green room guys says ‘4 green’ and is incorrect
On average, the observers are correct 80% of the time because the frequency of a red verses green observer is information about the true distribution.
Suppose you ask a person in a green room. In this case, 10 trials on average would like this:
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
GGGGR—green room guys says ‘4 green’ and is correct
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
RRRRG—green room guys says ‘4 green’ and is incorrect
Now, the observers are only correct 50% of the time because their distribution doesn’t reflect the true distribution. You skewed the frequency of green roomers by pre-selecting green.
This was my summary solution to the problem: