About 1log(n) of the numbers are prime and have value −1.
As we multiply by more and more factors, λ should alternate between ±1.
But starting at −1 should introduce a bias towards −1.
It’s not clear how big the bias would be. A natural guess would be in the ballpark of 1log(n).
A bit more precisely, I think the number of prime factors is roughly 1 plus a poisson with mean loglogn. (I’m just getting that from some random google results, e.g. here. But it seems intuitively right: each number k has a probability of about 1k of dividing n and a probability of about 1logk of being prime, and the sum of 1klogk is loglogn.)
Poisson distributions are more likely to be even than odd (and we’re adding 1 so more likely to be odd than even). This computation gives you a bias of order e−2λ. Putting in λ=loglogn we get a bias of order 1log(n)2.
That’s still way bigger than the real bias 1√n.
So tentatively I’m pretty surprised by how close this is to 50-50. I do feel like I’m probably overlooking something or messed up that argument. If not then this is a bit of a different situation from the other surprises, since there’s a heuristic argument suggesting a big bias that we don’t actually see. It would be good to understand what’s up here.
(From my perspective this is probably the more interesting kind of failure, since it suggests that this is a case where the heuristic argument really “was wrong” instead of merely overlooking something that could be pointed out by a more sophisticated argument.)
The Poisson approximation is not good: it’s actually a known theorem that the number of prime factors of N for N large behaves like it’s distributed normally with mean loglogN+O(1) and standard deviation (1+o(1))√loglogN. Since the whole even/odd distinction relies crucially on the Poisson approximation, it also fails here. (This part is incorrect, the distinction between Poisson and normal doesn’t matter in this case because both approximations are too low resolution to be useful. See my next comment in the thread.)
It’s easy to give a heuristic justification for this: the basic idea is that if you look at an interval [N,2N] for N large, then the indicator functions 1p:[N,2N]→{0,1} which take the value 1 if p divides the input and 0 otherwise are all “independent enough” for, say, p≤N1/4.
Now, you want to know the distribution of
number of prime divisors(x)=∑p≤2N1p(x)
Primes between N1/4 and 2N don’t contribute much to this sum, because the sum of the reciprocals of the primes scales like loglogN, so you can pretend they don’t exist. When you do that, the remaining indicator functions are all uncorrelated enough with each other thanks to the Chinese remainder theorem that you can apply some version of the central limit theorem to conclude this random variable defined on [N,2N] basically behaves as if it’s normally distributed. It’s easy to check both its mean and variance are ∼loglogN.
That’s the same heuristic argument I was imagining for why it would be Poisson (except I wasn’t being careful and thinking about CRT, or explicitly cutting out the large primes, just naively assuming independence by default). When mean and variance are equal, isn’t a Poisson distribution extremely close to a normal distribution? Or put differently: if you are adding up coins whose probability of heads is close to 0, I’d expect the poisson approximation to be as good as the normal approximation. There are higher-order differences between the two distributions, but is it specifically known to be more normal than Poisson?
I guess the big problem is that you need an extremely accurate approximation to get at the parity, and neither of these approximations is very accurate? E.g. adding 1 is a pretty small impact on the distribution, but a huge impact on the parity.
Another pass at the heuristic argument:
The number of copies of a prime p that divides x is geometric. The probability that the count is odd is 1p+1. I’d guess these events are all pretty independent for p<x1/4 based on what you said.
If we have a bunch of coins with probability 1p+1 of being odd, the probability of the sum being even is 12+12∏(1−2p+1)
That’s a bias of order 1log(x)2 again.
So it seems like the count of small prime divisors should have a very significant bias towards being even.
So I guess all the action is in the large prime divisors, and probably from the anticorrelation with the count of small prime divisors (seems like the magic is that 0 isn’t a valid answer). I guess we kind of knew that anyway, since it’s going to flip the sign of the bias. In expectation there are O(1) large prime divisors, but that’s still enough to totally dominate the calculus.
This doesn’t feel like a promising line of attack. Would be interesting to check the claim that the count of small prime divisors with multiplicities is quite even-skewed.
I agree with all of that, and I think if you simulated it you’d indeed find a large bias for the number of small prime divisors to be even. The problem is 1/log(x)2 is such a small bias in expectation that it will already be offset by just the primes in the interval [x,2x] which contributes on the order of ∼1/logx to the average in the opposite direction, since all primes trivially have an odd number of divisors.
Furthermore, there are subtleties here about exactly how much independence you need. If you want everything to be jointly independent then you can really only work with the primes up to logx while being safe—this is because the product of the primes up to x is roughly of order ex. Once you go past that, while correlations involving only a small number of primes are still fine, correlations involving lots of primes break down, and for parity problems you need to control the entire joint distribution in a fine way.
This is not a problem for the normal approximation because to show convergence in distribution to a normal distribution, you just need to show all the moments converge to the right values and use a result like Stone-Weierstrass to approximate any continuous test function uniformly by a polynomial. You can do this just by working with primes up to xα for α depending on the exact moment you’re studying. However, this result is really “low resolution”, as you correctly identify.
It so happens that I asked a relevant question on MathOverflow recently. Very difficult to find any elementary explanation for this phenomenon.
That’s another great example.
Thinking through the heuristic argument:
About 1log(n) of the numbers are prime and have value −1.
As we multiply by more and more factors, λ should alternate between ±1.
But starting at −1 should introduce a bias towards −1.
It’s not clear how big the bias would be. A natural guess would be in the ballpark of 1log(n).
A bit more precisely, I think the number of prime factors is roughly 1 plus a poisson with mean loglogn. (I’m just getting that from some random google results, e.g. here. But it seems intuitively right: each number k has a probability of about 1k of dividing n and a probability of about 1logk of being prime, and the sum of 1klogk is loglogn.)
Poisson distributions are more likely to be even than odd (and we’re adding 1 so more likely to be odd than even). This computation gives you a bias of order e−2λ. Putting in λ=loglogn we get a bias of order 1log(n)2.
That’s still way bigger than the real bias 1√n.
So tentatively I’m pretty surprised by how close this is to 50-50. I do feel like I’m probably overlooking something or messed up that argument. If not then this is a bit of a different situation from the other surprises, since there’s a heuristic argument suggesting a big bias that we don’t actually see. It would be good to understand what’s up here.
(From my perspective this is probably the more interesting kind of failure, since it suggests that this is a case where the heuristic argument really “was wrong” instead of merely overlooking something that could be pointed out by a more sophisticated argument.)
The Poisson approximation is not good: it’s actually a known theorem that the number of prime factors of N for N large behaves like it’s distributed normally with mean loglogN+O(1) and standard deviation (1+o(1))√loglogN.
Since the whole even/odd distinction relies crucially on the Poisson approximation, it also fails here. (This part is incorrect, the distinction between Poisson and normal doesn’t matter in this case because both approximations are too low resolution to be useful. See my next comment in the thread.)It’s easy to give a heuristic justification for this: the basic idea is that if you look at an interval [N,2N] for N large, then the indicator functions 1p:[N,2N]→{0,1} which take the value 1 if p divides the input and 0 otherwise are all “independent enough” for, say, p≤N1/4.
Now, you want to know the distribution of
number of prime divisors(x)=∑p≤2N1p(x)
Primes between N1/4 and 2N don’t contribute much to this sum, because the sum of the reciprocals of the primes scales like loglogN, so you can pretend they don’t exist. When you do that, the remaining indicator functions are all uncorrelated enough with each other thanks to the Chinese remainder theorem that you can apply some version of the central limit theorem to conclude this random variable defined on [N,2N] basically behaves as if it’s normally distributed. It’s easy to check both its mean and variance are ∼loglogN.
That’s the same heuristic argument I was imagining for why it would be Poisson (except I wasn’t being careful and thinking about CRT, or explicitly cutting out the large primes, just naively assuming independence by default). When mean and variance are equal, isn’t a Poisson distribution extremely close to a normal distribution? Or put differently: if you are adding up coins whose probability of heads is close to 0, I’d expect the poisson approximation to be as good as the normal approximation. There are higher-order differences between the two distributions, but is it specifically known to be more normal than Poisson?
I guess the big problem is that you need an extremely accurate approximation to get at the parity, and neither of these approximations is very accurate? E.g. adding 1 is a pretty small impact on the distribution, but a huge impact on the parity.
Another pass at the heuristic argument:
The number of copies of a prime p that divides x is geometric. The probability that the count is odd is 1p+1. I’d guess these events are all pretty independent for p<x1/4 based on what you said.
If we have a bunch of coins with probability 1p+1 of being odd, the probability of the sum being even is 12+12∏(1−2p+1)
That’s a bias of order 1log(x)2 again.
So it seems like the count of small prime divisors should have a very significant bias towards being even.
So I guess all the action is in the large prime divisors, and probably from the anticorrelation with the count of small prime divisors (seems like the magic is that 0 isn’t a valid answer). I guess we kind of knew that anyway, since it’s going to flip the sign of the bias. In expectation there are O(1) large prime divisors, but that’s still enough to totally dominate the calculus.
This doesn’t feel like a promising line of attack. Would be interesting to check the claim that the count of small prime divisors with multiplicities is quite even-skewed.
I agree with all of that, and I think if you simulated it you’d indeed find a large bias for the number of small prime divisors to be even. The problem is 1/log(x)2 is such a small bias in expectation that it will already be offset by just the primes in the interval [x,2x] which contributes on the order of ∼1/logx to the average in the opposite direction, since all primes trivially have an odd number of divisors.
Furthermore, there are subtleties here about exactly how much independence you need. If you want everything to be jointly independent then you can really only work with the primes up to logx while being safe—this is because the product of the primes up to x is roughly of order ex. Once you go past that, while correlations involving only a small number of primes are still fine, correlations involving lots of primes break down, and for parity problems you need to control the entire joint distribution in a fine way.
This is not a problem for the normal approximation because to show convergence in distribution to a normal distribution, you just need to show all the moments converge to the right values and use a result like Stone-Weierstrass to approximate any continuous test function uniformly by a polynomial. You can do this just by working with primes up to xα for α depending on the exact moment you’re studying. However, this result is really “low resolution”, as you correctly identify.