The central limit theorem in terms of convolutions

There are mul­ti­ple ver­sions of the cen­tral limit the­o­rem. One com­mon ver­sion says that, as , the mean of means looks like a Gaus­sian dis­tri­bu­tion. Peo­ple in in­dus­try some­times ig­nore the “goes to in­finity” part, and as­sume that if is large, that’s good enough. But is that true? When is it safe to as­sume that your means have con­verged? I heard of a case re­cently where the mean of means was as­sumed to have con­verged, hadn’t, and the ap­pli­ca­tion it was a part of suffered ac­cu­racy prob­lems for months with­out any­one know­ing. This se­quence will ex­plore: un­der what con­di­tions does the cen­tral limit the­o­rem “fail” like this?

The cen­tral limit the­o­rem is about convolutions

There are mul­ti­ple ver­sions of the cen­tral limit the­o­rem. They’re all a ver­sion of the state­ment:

If you have a bunch of dis­tri­bu­tions (say, of them), and you con­volve them all to­gether into a dis­tri­bu­tion , then the larger is, the more will re­sem­ble a Gaus­sian dis­tri­bu­tion.

The sim­plest ver­sion of the cen­tral limit the­o­rem re­quires that the dis­tri­bu­tions must be 1) in­de­pen­dent and 2) iden­ti­cally dis­tributed. In this se­quence, I’m gonna as­sume #1 is true. We’ll find that while con­di­tion #2 is nice to have, even with­out it, dis­tri­bu­tions can con­verge to a Gaus­sian un­der con­volu­tion.

A Gaus­sian dis­tri­bu­tion is the same thing as a Nor­mal dis­tri­bu­tion. Some ex­am­ples of Gaus­sian dis­tri­bu­tions:

Wikipedia

Wait—this doesn’t sound like the cen­tral limit the­o­rem I know

Most state­ments of the cen­tral limit the­o­rem, in­clud­ing Wikipe­dia’s, talk in terms of the sums of ran­dom vari­ables (and their den­sity func­tions and ex­pected val­ues). But this is the same thing as our con­volu­tion-of-dis­tri­bu­tions, be­cause the den­sity func­tion of the sum of two ran­dom vari­ables X, Y is the con­volu­tion of the den­sity func­tions of X and Y. Look­ing at the cen­tral limit the­o­rem in terms of con­volu­tions will make it eas­ier to see some things. It’s also use­ful if your ver­sion of prob­a­bil­ity doesn’t have the con­cept of a ran­dom vari­able, like prob­a­bil­ity the­ory as ex­tended logic*.

Another state­ment of the cen­tral limit the­o­rem, from Jaynes:

The cen­tral limit the­o­rem tells us what is es­sen­tially a sim­ple com­bi­na­to­rial fact, that out of all con­ceiv­able er­ror vec­tors that could be gen­er­ated, the over­whelming ma­jor­ity have about the same de­gree of can­cel­la­tion.

Hope­fully this is enough to con­vince you that the cen­tral limit the­o­rem need not be stated in terms of ran­dom vari­ables or sam­ples from a pop­u­la­tion.

(Also: the cen­tral limit the­o­rems are some­times talked about in terms of means only. They do say things about means, but they also say very much about dis­tri­bu­tions. I’ll talk about both.)

*: first three chap­ters here.

Convolutions

The cen­tral limit the­o­rem is a state­ment about the re­sult of a se­quence of con­volu­tions. So to un­der­stand the cen­tral limit the­o­rem, it’s re­ally im­por­tant to know what con­volu­tions are and to de­velop a good in­tu­ition for them.

Take two func­tions, and . Re­v­erse on the y-axis, then slide it and along each other, tak­ing the product of each pair of num­bers as you slide. The re­sult is the con­volu­tion of and . Here’s a pic­ture - is blue, is red, and the con­volu­tion is black:

Wikipedia

Another one, with a differ­ent func­tion :

You can write this as .

The first two sec­tions on this page give a nice ex­pla­na­tion of con­volu­tions in terms of drop­ping a ball twice. This page lets you choose two func­tions to con­volve vi­su­ally, and I highly recom­mend it for get­ting a feel for the op­er­a­tion.

Con­volu­tion has nice properties

Con­volu­tion is as­so­ci­a­tive and com­mu­ta­tive—if you want to con­volve mul­ti­ple func­tions to­gether, it doesn’t mat­ter what or­der you do it in. (Also, noth­ing I’ve said in this sec­tion re­quires and to be prob­a­bil­ity dis­tri­bu­tions—they can be any two func­tions.)

You can get the mean of a con­volu­tion with­out ac­tu­ally do­ing the con­volu­tions. If the first mo­ments of and g are and (for prob­a­bil­ity dis­tri­bu­tions, the first mo­ment is the mean), then the first mo­ment of is . There is a hint here about why peo­ple of­ten talk about means when they talk about the cen­tral limit the­o­rem—be­ing able to get the mean of the con­volu­tion by just adding the means of the un­der­ly­ing dis­tri­bu­tions to­gether is re­ally nice.

You can get the sec­ond mo­ment with­out ac­tu­ally con­volv­ing, too. (the sec­ond mo­ment is closely re­lated to the var­i­ance: if the mean is 0, the sec­ond mo­ment and the var­i­ance are the same). For and with sec­ond mo­ments , , the sec­ond mo­ment of is . And so on for higher mo­ments, for as long as and ac­tu­ally have those higher mo­ments them­selves.

If and are the fourier trans­forms of and , then . This is yet an­other case where you don’t ac­tu­ally have to com­pute the con­volu­tion to get the thing. I don’t ac­tu­ally use fourier trans­forms or have any in­tu­ition about them, but for those who do, maybe this is use­ful?

If sums to 1 and sums to 1, then sums to 1. So if and are prob­a­bil­ity dis­tri­bu­tions, so is .

In the next post, I’ll look at what hap­pens when you con­volve a bunch of differ­ent dis­tri­bu­tions to­gether, and ex­plore how much of what hap­pens de­pends on the form of those dis­tri­bu­tions.


Post script: not neu­ral networks

You may have heard of con­volu­tional neu­ral net­works from all their suc­cess in image pro­cess­ing. I think they in­volve the same con­volu­tion I’m talk­ing about here, but in much higher di­men­sions. I’ll only be talk­ing about con­volu­tions of one-di­men­sional func­tions in this se­quence. Some (or a lot?) of the stuff I’ll say about the cen­tral limit the­o­rem prob­a­bly ap­plies in higher di­men­sions too, but I’m not sure what changes as the di­men­sion in­creases. So, this se­quence is not about the neu­ral net­works.