How long does it take to become Gaussian?

The central limit theorems all say that if you convolve stuff enough, and that stuff is sufficiently nice, the result will be a Gaussian distribution. How much is enough, and how nice is sufficient?

Identically-distributed distributions converge quickly

For many distributions , the repeated convolution looks Gaussian. The number of convolutions you need to look Gaussian depends on the shape of . This is the easiest variant of the central limit theorem: identically-distributed distributions.

The uniform distribution converges real quick:

The result of uniform(1, 2) * uniform(1, 2) * … * uniform(1, 2), with 30 distributions total. This plot is an animated version of the plots in the previous post. The black curve is the Gaussian distribution with the same mean and variance as the red distribution. The more similar red is to black, the more Gaussian the result of the convolutions is.

The numbers on the x axis are increasing because the mean of is the sum of the means of and , so if we start with positive means, repeated convolutions shoot off into higher numbers. Similar for the variance—notice how the width starts as the difference between 1 and 2, but ends with differences in the tens. You can keep the location stationary under convolution by starting with a distribution centered at 0, but you can’t keep the variance from increasing, because you can’t have a variance of 0 (except in the limiting case).

Here’s a more skewed distribution: beta(50, 1). beta(50, 1) is the probability distribution that represents knowing that a lake has bass and carp, but not how many of each, and then catching 49 bass in a row. It’s fairly skewed! This time, after 30 convolutions, we’re not quite Gaussian—the skew is still hanging around. But for a lot of real applications, I’d call the result “Gaussian enough”.

beta(50, 1) convolved with itself 30 times.

A similar skew in the opposite direction, from the exponential distribution:

exp(20)

I was surprised to see the exponential distribution go into a Gaussian, because Wikipedia says that an exponential distribution with parameter goes into a gamma distribution with parameters gamma(, ) when you convolve it with itself times. But it turns out gamma() looks more and more Gaussian as goes up.

How about our ugly bimodal-uniform distribution?

It starts out rough and jagged, but already by 30 convolutions it’s Gaussian.

And here’s what it looks like to start with a Gaussian:

The red curve starts out the exact same as the black curve, then nothing happens because Gaussians stay Gaussian under self-convolution.

An easier way to measure Gaussianness (Gaussianity?)

We’re going to want to look at many more distributions under convolutions and see how close they are to Gaussian, and these animations take a lot of space. We need a more compact way. So let’s measure the kurtosis of the distributions, instead. The kurtosis is the fourth moment of a probability distribution; it describes the shape of the tails. All Gaussian distributions have kurtosis 3. There are other distributions with kurtosis 3, too, but they’re not likely to be the result of a series of convolutions. So to check how close a distribution is to Gaussian, we can just check how far from 3 its kurtosis is.

We can chart the kurtosis as a function of how many convolutions have been done so far, for each of the five distributions above:

We see our conclusions from the animations repeated: the exp(20), being very skewed, is the furthest from Gaussian after 30 convolutions. beta(50, 1), also skewed, is also relatively far (though close in absolute terms). The bimodal and uniform got to Gaussian much faster, in the animations, and we see that reflected here by how quickly the green and pink lines approach the kurtosis=3 horizontal line.

Notice: the distributions that have a harder time making it to Gaussian are the two skewed ones. It turns out the skew of a distribution goes a long way in determining how many convolutions you need to get Gaussian. Plotting the kurtosis at convolution 30 against the skew of the original distribution (before any convolutions) shows that skew matters a lot:

The further a distribution is from the gold horizontal, the more self-convolutions it takes for it to reach Gaussian.

So the skew of the component distributions goes a long way in determining how quick their convolution gets Gaussian. The Berry-Esseen theorem is a central limit theorem that says something similar (but see this comment). So here’s our first rule for eyeing things in the wild and trying to figure out whether the central limit theorem will apply well enough to get you a Gaussian distribution: how skewed are the input distributions? If they’re viciously skewed, you should worry.

Non-identically-distributed distributions converge quickly too

In real problems, distributions won’t be identically distributed. This is the interesting case. If instead of a single distribution convolved with itself, we take , then a version of the central limit theorem still applies—the result can still be Gaussian. So let’s take a look.

Here are three different Beta distributions, on the left, and on the right, their convolution, with same setup from the animations: red is the convolution, and black is the true Gaussian with the same mean and variance.

They’ve almost converged! This is a surprise. I really didn’t expect as few as three distributions to convolve into this much of a Gaussian. On the other hand, these are pretty nice distributions—the blue and green look pretty Gaussian already. That’s cheating. Let’s try less nice distributions. We saw above that distributions with higher skew are less Gaussian after convolution, so let’s crank up the skew. It’s hard to get a good skewed Beta distribution, so let’s use Gamma distributions instead.

While the three distributions on the left might look quite similar, it’s only because of the extended range of the plot—I extended the x-axis up to four standard deviations away from the mean, for each distribution individually (That shows you how skewed these Gammas are!). The Gammas are not super similar: their means, in order, are (350, 128, 436), and their standard deviations are (26, 9, 29).

Not Gaussian yet—the red convolution result line still looks Gamma-ish. But if we go up to a convolution of 30 gamma distributions this skewed...

… already, we’re pretty much Gaussian.

I’m really surprised by this. I started researching this expecting the central limit theorem convergence to fall apart, and require a lot of distributions, when the input distributions got this skewed. I would have guessed you needed to convolve hundreds to approach Gaussian. But at 30, they’re already there! This helps explain how carefree people can be in assuming the CLT applies, sometimes even when they haven’t looked at the distributions: convergence really doesn’t take much.