Surprised no one has brought up the Fourier domain representation/characteristic functions. Over there, convolution is just repeated multiplication, so what this gives is (ˆf(ωn1/2))n. Conveniently, gaussians stay gaussians, and the fact that we have probability distributions fixes ˆf(0)=1. So what we’re looking for is how quickly the product above squishes to a gaussian around ω=0, which looks to be in large part determined by the tail behavior of ˆf. I suspect what is driving your result of needing few convolutions is the fact that you’re working with smooth, mostly low frequency functions. For example, exp, which is pretty bad, still has a O(1n2)decay. By throwing in some jagged edges, you could probably concoct a function which will eventually converge to a gaussian, but will take rather a long time to get there (for functions which are piecewise smooth, the decay is O(1n).
One of these days I’ll take a serious look at characteristic functions, which is roughly the statisticians way of thinking about what I was saying. There’s probably an adaptation of the characteristic function proof of the CLT that would be useful here.
I was thinking something similar. I vaguely remember that the characteristic function proof includes an assumption of n being large, where n is the number of variables being summed. I think that allows you to ignore some higher order n terms. So by keeping those in you could probably get some way to quantify how “close” a resulting distribution is to Gaussian. And you could relate that back to moments quite naturally as well.
Surprised no one has brought up the Fourier domain representation/characteristic functions. Over there, convolution is just repeated multiplication, so what this gives is (ˆf(ωn1/2))n. Conveniently, gaussians stay gaussians, and the fact that we have probability distributions fixes ˆf(0)=1. So what we’re looking for is how quickly the product above squishes to a gaussian around ω=0, which looks to be in large part determined by the tail behavior of ˆf. I suspect what is driving your result of needing few convolutions is the fact that you’re working with smooth, mostly low frequency functions. For example, exp, which is pretty bad, still has a O(1n2) decay. By throwing in some jagged edges, you could probably concoct a function which will eventually converge to a gaussian, but will take rather a long time to get there (for functions which are piecewise smooth, the decay is O(1n).
One of these days I’ll take a serious look at characteristic functions, which is roughly the statisticians way of thinking about what I was saying. There’s probably an adaptation of the characteristic function proof of the CLT that would be useful here.
I was thinking something similar. I vaguely remember that the characteristic function proof includes an assumption of n being large, where n is the number of variables being summed. I think that allows you to ignore some higher order n terms. So by keeping those in you could probably get some way to quantify how “close” a resulting distribution is to Gaussian. And you could relate that back to moments quite naturally as well.