This post is about two problems. The first half of the post is about errors caused by using averages. This has a simple solution: don’t use averages. The second half of the post is about understanding tail risk. This is a hard problem, but I don’t think it has much to do with the first problem.
Just don’t use averages. Why do you care about averages?
In almost all situations, the median is a better single number summary of a random variable. For example, the median sex ratio is more salient than the mean sex ratio. Mean temperature change is irrelevant. Median temperature change is important. More important is the probability of a catastrophically large change. So answer that question. And answer both questions! You can almost always afford more than a single number to summarize your random variable.
Once you know that the question is the size of the tail, it should be clear that the mean is irrelevant. Even if you do care about the mean, as by fiat in the random series, it is better not to just sample from it, but to understand the whole distribution. If you do that, you’ll see that it is skewed, which should make you nervous.
Both parts are about the hazards of trusting results from computer simulations. Putting the two parts together shows the connection between divergent series and tail risk. When a series inside the computation changes from convergent to divergent, you then have tail risk, and the simulation’s trustworthiness degrades suddenly, yet in a way that may be difficult to notice.
If you are a rational agent, you work with averages, because you want to maximize expected utility. In the kinds of simulations I mentioned, it’s especially important to look at the average rather than the mean, because most of the harm in the real world (from economic busts, wars, climate change, earthquakes, superintelligences) comes from the outliers. In some cases, the median behavior is negligible. The average is a well-defined way of getting at that, while “probability of a catastrophically large change” is not capable of being defined.
If you are a rational agent, you work with averages, because you want to maximize expected utility.
Caring about expected utility doesn’t mean you should care about expected anything else. Don’t take averages early. In particular, the expected ratio of girls to boy is not the ratio of expected boys to expected girls. Similarly, “expected rise in water level per year” is irrelevant because utility is not linear in sea level.
This post is about two problems. The first half of the post is about errors caused by using averages. This has a simple solution: don’t use averages. The second half of the post is about understanding tail risk. This is a hard problem, but I don’t think it has much to do with the first problem.
Just don’t use averages.
Why do you care about averages?
In almost all situations, the median is a better single number summary of a random variable. For example, the median sex ratio is more salient than the mean sex ratio. Mean temperature change is irrelevant. Median temperature change is important. More important is the probability of a catastrophically large change. So answer that question. And answer both questions! You can almost always afford more than a single number to summarize your random variable.
Once you know that the question is the size of the tail, it should be clear that the mean is irrelevant. Even if you do care about the mean, as by fiat in the random series, it is better not to just sample from it, but to understand the whole distribution. If you do that, you’ll see that it is skewed, which should make you nervous.
Both parts are about the hazards of trusting results from computer simulations. Putting the two parts together shows the connection between divergent series and tail risk. When a series inside the computation changes from convergent to divergent, you then have tail risk, and the simulation’s trustworthiness degrades suddenly, yet in a way that may be difficult to notice.
If you are a rational agent, you work with averages, because you want to maximize expected utility. In the kinds of simulations I mentioned, it’s especially important to look at the average rather than the mean, because most of the harm in the real world (from economic busts, wars, climate change, earthquakes, superintelligences) comes from the outliers. In some cases, the median behavior is negligible. The average is a well-defined way of getting at that, while “probability of a catastrophically large change” is not capable of being defined.
Caring about expected utility doesn’t mean you should care about expected anything else. Don’t take averages early. In particular, the expected ratio of girls to boy is not the ratio of expected boys to expected girls. Similarly, “expected rise in water level per year” is irrelevant because utility is not linear in sea level.
True. Computing average utility is what’s important. But just using the median as you suggested doesn’t let you compute average utility.