gathaung comments on The robust beauty of improper linear models

• Nice. To make your proposed explanation more precise:

Take a random vector on the n-dim unit sphere. Project to the nearest (+1,-1)/​sqrt(n) vector; what is the expected l2-distance /​ angle? How does it scale with n?

If this value decreases in n, then your explanation is essentially correct, or did you want to propose something else?

Start by taking a random vector x where each coordinate is unit gaussian (normalize later). The projection px just splits into positive coordinates and negative coordinates.

We are interested in E[ /​ |x| sqrt(n)].

If the dimension is large enough, then we wont really need to normalize; it is enough to start with 1/​sqrt(n) gaussians, as we will almost almost surely get almost unit length. Then all components are independent.

For the angle, we then (approximately) need to compute E(sum_i |x_i| /​ n), where each x_i is unit Gaussian. This is asymptotically independent of n; so it appears like this explanation of improper linear models fails.

Darn, after reading your comment I mistakenly believed that this would be yet another case of “obvious from high-dimensional geometry” /​ random projection.

PS. In what sense are improper linear models working? l_1, l2, l\infty sense?

Edit: I was being stupid, leaving the above for future ridicule. We want E(sum_i |x_i| /​ n)=1, not E(sum_i |x_i|/​n)=0.

Folded Gaussian tells us that E[ sum_i |x_i|/​n]= sqrt(2/​pi), for large n. The explanation still does not work, since 2/​pi <1, and this gives us the expected error margin of improper high-dimensional models.

@Stuart: What are the typical empirical errors? Do they happen to be near sqrt(2/​pi), which is close enough to 1 to be summarized as “kinda works”?