So the skew of the component distributions goes a long way in determining how quick their convolution gets Gaussian. The Berry-Esseen theorem is a central limit theorem that says something similar.
It might be helpful to quantify exactly how the Berry-Esseen theorem relates the skew, because, as you hint, it isn’t a direct correspondence. If, like I did, you expect Berry-Esseen to use the skew directly, you’ll be in for a good confusion.
In the simplest incarnation of the theorem, consider an IID sequence of observations (Xi) with finite mean μ, finite third absolute moment E[|X1|3], and positive standard deviation σ. Let Fn be the CDF of the rescaled and centered sample mean √n¯Xn−μσ and let Φ be the CDF of the normal N(0,1). Berry-Esseen upper-bounds the Kolmogorov-Smirnov statistic:
supx∈R|Fn(x)−Φ(x)|≤CE[|X1−μ|3]σ3√n,
where C is some constant (.5 works). This theorem is rad because it bounds how slowly the CLT takes to work its magic.
However, skew is E[(X1−μσ)3]≤E[∣∣X1−μσ∣∣3]=E[|X1−μ|3]σ3, and so the quantity used by Berry-Esseen is lower-bounded by the absolute value of the skew. This is the precise link to be made with the Berry-Esseen theorem.
If you don’t realize that absolute skew != the RHS of Berry-Esseen, you’ll be confused as follows. All symmetric distributions have 0 skew, and so then you’d expect uniform distributions to instantly converge to Gaussian (since the upper bound on the Kolmogorov-Smirnov would be 0).
It might be helpful to quantify exactly how the Berry-Esseen theorem relates the skew, because, as you hint, it isn’t a direct correspondence. If, like I did, you expect Berry-Esseen to use the skew directly, you’ll be in for a good confusion.
In the simplest incarnation of the theorem, consider an IID sequence of observations (Xi) with finite mean μ, finite third absolute moment E[|X1|3], and positive standard deviation σ. Let Fn be the CDF of the rescaled and centered sample mean √n¯Xn−μσ and let Φ be the CDF of the normal N(0,1). Berry-Esseen upper-bounds the Kolmogorov-Smirnov statistic:
supx∈R|Fn(x)−Φ(x)|≤CE[|X1−μ|3]σ3√n,where C is some constant (.5 works). This theorem is rad because it bounds how slowly the CLT takes to work its magic.
However, skew is E[(X1−μσ)3]≤E[∣∣X1−μσ∣∣3]=E[|X1−μ|3]σ3, and so the quantity used by Berry-Esseen is lower-bounded by the absolute value of the skew. This is the precise link to be made with the Berry-Esseen theorem.
If you don’t realize that absolute skew != the RHS of Berry-Esseen, you’ll be confused as follows. All symmetric distributions have 0 skew, and so then you’d expect uniform distributions to instantly converge to Gaussian (since the upper bound on the Kolmogorov-Smirnov would be 0).
Thanks for investigating, this is helpful—I added a link to this comment to the post.