Reference Classes

Epistemic Status: Just some thoughts off the top of my head

Fake Nous recently featured an article on agent-centered evidence:

Sue had a premonition about the flight, and then the plane crashed. For Sue, that’s pretty strong evidence of precognition. We would completely understand Sue’s resolution to never get on a plane that she has a bad feeling about; this would not seem unreasonable at all. But for third parties, it’s not very convincing. Is it?
… This event is a biased sample from the class of stuff that happens. The reason I heard about this story is that something weird happened – if Sue had a premonition that was completely wrong, then the story wouldn’t get repeated and I wouldn’t have heard about it. Furthermore, since there have been billions of people in the world, I should initially expect that some things like this would have happened, even if there were no precognition or ESP.” But when Sue herself experiences the event, she shouldn’t say that. To her, her own life is not a biased sample.
… That seems to make sense. Two people can get “the same evidence” but by a different evidence-collection method, and of course that can affect the significance of the evidence… There is still something weird about this, though, because Sue knows how the situation looks to third parties, and they know how the situation looks to her. Both seemingly know the same facts. The third parties know that Sue’s experience is not a biased sample to her. She knows that her experience is, to other people, just the experience of one among the 7 billion people on earth, and not particularly remarkable to them.
… Another question: what about people who know Sue personally? But if Sue is a member of your immediate family, you might say, “There are 4 members of my family… And if so, why couldn’t we extend this to Sue’s barrista at Starbucks? The barrista could say, “I have had only 100 customers today, of whom one had a precognition-like experience”, which sounds fairly remarkable.

I actually think that there is a form of agent-centered evidence called intuition which isn’t easily communicatable to other agents and can often prove fairly useful. However, that’s not the issue I want to discuss today. Instead, I want to talk about reference classes. But before I cover that, let’s talk about shots. Suppose we have a raffle with one prize and a thousand tickets. If you have one ticket, you have one shot, while if you have ten tickets, you have ten shots. I haven’t precisely defined it, but I think this example should be clear enough. Once you know the number of shots, you can turn it into a probability.

Let’s suppose that you have four members in your family and that you also have four work colleagues. If one of your family members has a precognition-like experience, you might say that it is remarkable as you only had four shots, but let’s suppose that in the counterfactual where one of your colleagues had an experience you would have also counted it as four shots. This seems like a mistake; only one can be counted as four shots and if the other occurs, then it has to be counted as eight shots with the group being classed as family AND colleagues.

If you have a bunch of different groups, say three family members, another five work colleagues and eight cousins, then you can order them arbitrarily. It doesn’t matter if you do (3,5,8) or (5,3,8) or (8,3,5); any of them is fine. If you order them (8,3,5), then the number of shots for a member of a group is 8 for the first, 11 for the second and 16 for the third.

This might seem strange. We are calculating a different probability of psychic powers existing when an event happens to a member of your family vs. one of your colleagues, even though there doesn’t seem to be any fundamental reason written into the universe itself why one group should give you more evidence.

Then again, the probability you assign to something existing is really more about your subjective state, such as the information available to you and how it was generated, rather than the objective state of the universe itself. We can think about choosing how to order your groups the same way that we think about committing to a (frequentialist) experiment design in advance. It’s well understood that if you test more hypotheses, you increase your chance of a spurious result. For example, if you test for effects in male adults, female adults, male children and female children, you’ve taken four shots as opposed to having only tested for an overall effect. This is typically adjusted for by using a significance threshold that isn’t constant, but instead depends on the number of hypotheses that you are testing.

We can take this analogy further. Suppose in the example above, we choose the ordering, (5, 3, 8). Then this defines three experiments—work collegues only with five shots, work + family with eight shots and all groups with sixteen shots. If we observe one of our family members having such an experience, we can treat it as us having pre-committed to an experiment covering family and work colleagues with eight shots. This is far better than the naive tendency we might have to define the group as just family with three shots.

However, it still isn’t a completely accurate way to handle probability, as if we want an accurate an estimate of psychic ability as possible, then we should take into account all the evidence available. So if we also know about whether our cousins have had such experiences, then we really should take that into account when calculating the probability. Of course, trying to figure out this implicit sample might greatly complicate the calculation, which is why this group based approximation is much more appealing instead.

That said, this is quite an unusual approximation, as it can result in completely different probabilities than if we had the whole data. For example, observing a positive out of five shots striking instead of a positive out of sixteen makes a huge difference in the actual probabilities. Nonetheless, if you had precommitted to making a decision based on the first five, then the increase in probability when you saw a positive result would be perfectly matched by the decrease when you a negative result. This means that deciding in advance to only look at the first five wouldn’t bias the result, even if throws away data.

Perhaps a more realistic scenario is one where you precommit to expanding the experimental group until you hit a positive result or you’ve expanded it to the end. This would represent the fact that someone might not worry about how amazing it is that someone in their church had a particular experience if someone in their family had such an experience. These expansionary scenarios are too complex to handle using the shots framework, but even in this scenario the maths isn’t too hard.

Of course, a lot of the time we aren’t deciding in advance, but are instead deciding after the fact. In this case, you’re ability to use these schemes is highly, highly dependent on your ability to self-model. If you can do this well, then you can adopt these schemes after the fact, but if you do it poorly, it’ll completely mess up the results.