I am extremely interested in examples of heuristic arguments like this where the naive version leads you astray and it takes much more sophisticated arguments to fix the problem. I’d be happy to pay a $500 bounty for any examples that I find really interesting, in number theory or elsewhere. (Usual disclaimers, my judgment of what counts as interesting will be completely arbitrary and unaccountable, etc.)
The best example I found was the Chebyshev bias that primes are very slightly more likely to be 3 mod 4 than 1 mod 4. The simplest explanation I know of this goes through an explicit formula for the von Mangoldt function’s distribution mod 4 as explained by Terry Tao here.
(Roughly speaking: the prime powers should be uniform mod 4, based on an argument about L-functions that I don’t understand. But the prime squares are all 1 mod 4, so the primes need to be 3 mod 4 slightly more often to cancel out. It’s very surprising that it works this way—if you made me guess I would have quite strongly predicted that the prime powers would skew towards 1 mod 4, instead of the primes accommodating the squares like that.)
That post also describes another bias where prime gaps are surprisingly non-uniform, though once it’s pointed out that one feels much more obvious to me.
Another example is that the strong form of Cramer’s conjecture is false, i.e. the prime gaps are not as consistently small as you would expect based on a simple random model. I don’t have a good understanding of why this happens, people don’t seem shocked by it (but right now I feel kind of shocked).
I’m not really interested in diophantine equations where you can override a simple heuristic argument by factorization / analysis modulo fixed primes / dividing by common factors. It’s pretty easy to see why that happens, the heuristic argument seems like a good prima facie guess, and it’s easy to see how to adjust it once you notice new arguments.
About 1log(n) of the numbers are prime and have value −1.
As we multiply by more and more factors, λ should alternate between ±1.
But starting at −1 should introduce a bias towards −1.
It’s not clear how big the bias would be. A natural guess would be in the ballpark of 1log(n).
A bit more precisely, I think the number of prime factors is roughly 1 plus a poisson with mean loglogn. (I’m just getting that from some random google results, e.g. here. But it seems intuitively right: each number k has a probability of about 1k of dividing n and a probability of about 1logk of being prime, and the sum of 1klogk is loglogn.)
Poisson distributions are more likely to be even than odd (and we’re adding 1 so more likely to be odd than even). This computation gives you a bias of order e−2λ. Putting in λ=loglogn we get a bias of order 1log(n)2.
That’s still way bigger than the real bias 1√n.
So tentatively I’m pretty surprised by how close this is to 50-50. I do feel like I’m probably overlooking something or messed up that argument. If not then this is a bit of a different situation from the other surprises, since there’s a heuristic argument suggesting a big bias that we don’t actually see. It would be good to understand what’s up here.
(From my perspective this is probably the more interesting kind of failure, since it suggests that this is a case where the heuristic argument really “was wrong” instead of merely overlooking something that could be pointed out by a more sophisticated argument.)
The Poisson approximation is not good: it’s actually a known theorem that the number of prime factors of N for N large behaves like it’s distributed normally with mean loglogN+O(1) and standard deviation (1+o(1))√loglogN. Since the whole even/odd distinction relies crucially on the Poisson approximation, it also fails here. (This part is incorrect, the distinction between Poisson and normal doesn’t matter in this case because both approximations are too low resolution to be useful. See my next comment in the thread.)
It’s easy to give a heuristic justification for this: the basic idea is that if you look at an interval [N,2N] for N large, then the indicator functions 1p:[N,2N]→{0,1} which take the value 1 if p divides the input and 0 otherwise are all “independent enough” for, say, p≤N1/4.
Now, you want to know the distribution of
number of prime divisors(x)=∑p≤2N1p(x)
Primes between N1/4 and 2N don’t contribute much to this sum, because the sum of the reciprocals of the primes scales like loglogN, so you can pretend they don’t exist. When you do that, the remaining indicator functions are all uncorrelated enough with each other thanks to the Chinese remainder theorem that you can apply some version of the central limit theorem to conclude this random variable defined on [N,2N] basically behaves as if it’s normally distributed. It’s easy to check both its mean and variance are ∼loglogN.
That’s the same heuristic argument I was imagining for why it would be Poisson (except I wasn’t being careful and thinking about CRT, or explicitly cutting out the large primes, just naively assuming independence by default). When mean and variance are equal, isn’t a Poisson distribution extremely close to a normal distribution? Or put differently: if you are adding up coins whose probability of heads is close to 0, I’d expect the poisson approximation to be as good as the normal approximation. There are higher-order differences between the two distributions, but is it specifically known to be more normal than Poisson?
I guess the big problem is that you need an extremely accurate approximation to get at the parity, and neither of these approximations is very accurate? E.g. adding 1 is a pretty small impact on the distribution, but a huge impact on the parity.
Another pass at the heuristic argument:
The number of copies of a prime p that divides x is geometric. The probability that the count is odd is 1p+1. I’d guess these events are all pretty independent for p<x1/4 based on what you said.
If we have a bunch of coins with probability 1p+1 of being odd, the probability of the sum being even is 12+12∏(1−2p+1)
That’s a bias of order 1log(x)2 again.
So it seems like the count of small prime divisors should have a very significant bias towards being even.
So I guess all the action is in the large prime divisors, and probably from the anticorrelation with the count of small prime divisors (seems like the magic is that 0 isn’t a valid answer). I guess we kind of knew that anyway, since it’s going to flip the sign of the bias. In expectation there are O(1) large prime divisors, but that’s still enough to totally dominate the calculus.
This doesn’t feel like a promising line of attack. Would be interesting to check the claim that the count of small prime divisors with multiplicities is quite even-skewed.
I agree with all of that, and I think if you simulated it you’d indeed find a large bias for the number of small prime divisors to be even. The problem is 1/log(x)2 is such a small bias in expectation that it will already be offset by just the primes in the interval [x,2x] which contributes on the order of ∼1/logx to the average in the opposite direction, since all primes trivially have an odd number of divisors.
Furthermore, there are subtleties here about exactly how much independence you need. If you want everything to be jointly independent then you can really only work with the primes up to logx while being safe—this is because the product of the primes up to x is roughly of order ex. Once you go past that, while correlations involving only a small number of primes are still fine, correlations involving lots of primes break down, and for parity problems you need to control the entire joint distribution in a fine way.
This is not a problem for the normal approximation because to show convergence in distribution to a normal distribution, you just need to show all the moments converge to the right values and use a result like Stone-Weierstrass to approximate any continuous test function uniformly by a polynomial. You can do this just by working with primes up to xα for α depending on the exact moment you’re studying. However, this result is really “low resolution”, as you correctly identify.
A couple of examples from quadratic residue patterns modulo primes:
Example 1. Let p be a large prime. How many elements x(modp) are there such that both x and x+1 are quadratic residues?
Since half of elements mod p are quadratic residues and the events ”x is a QR” and ”x+1 is a QR” look like they are independent, a reasonable guess is p/4. This is the correct main term, but what about the error? A natural square-root error term is not right: one can show that the error is O(1), the error depending only on whether p is 1 or 3 mod 4. (The proof is by elementary manipulations with the Legendre symbol, see here. So there’s hidden structure that makes the error smaller than what a naive randomness heuristic suggests.)
Example 2. Let p be a large prime. How many elements x(modp) are such that all of x,x+1 and x+2 are quadratic residues?
Again, the obvious guess for the main term is correct (there are roughly p/8 such x), so consider the error term. The error is not O(1) this time! The next guess is a square-root error term. Perhaps as p ranges over the primes, the error term is (after suitable normalization) normally distributed (as is motivated by the central limit theorems)? This is not correct either!
The error is in fact bounded in absolute value by √p/4, following from a bound on the number of points on elliptic curves modulo p. And for the distribution of the error, if one normalizes the error by dividing by √p/4 (so that the resulting value is in [−1,1]), the distribution behaves like cos(U), where U is uniformly distributed on [−π,π] as p ranges over the primes (see here). This is a deep result, which is not easy to motivate in a couple of sentences, but again there’s hidden structure that the naive randomness heuristic does not account for.
(And no, one does not get normal distribution for longer streaks of quadratic residues either.)
I am extremely interested in examples of heuristic arguments like this where the naive version leads you astray and it takes much more sophisticated arguments to fix the problem. I’d be happy to pay a $500 bounty for any examples that I find really interesting, in number theory or elsewhere. (Usual disclaimers, my judgment of what counts as interesting will be completely arbitrary and unaccountable, etc.)
The best example I found was the Chebyshev bias that primes are very slightly more likely to be 3 mod 4 than 1 mod 4. The simplest explanation I know of this goes through an explicit formula for the von Mangoldt function’s distribution mod 4 as explained by Terry Tao here.
(Roughly speaking: the prime powers should be uniform mod 4, based on an argument about L-functions that I don’t understand. But the prime squares are all 1 mod 4, so the primes need to be 3 mod 4 slightly more often to cancel out. It’s very surprising that it works this way—if you made me guess I would have quite strongly predicted that the prime powers would skew towards 1 mod 4, instead of the primes accommodating the squares like that.)
That post also describes another bias where prime gaps are surprisingly non-uniform, though once it’s pointed out that one feels much more obvious to me.
Another example is that the strong form of Cramer’s conjecture is false, i.e. the prime gaps are not as consistently small as you would expect based on a simple random model. I don’t have a good understanding of why this happens, people don’t seem shocked by it (but right now I feel kind of shocked).
I’m not really interested in diophantine equations where you can override a simple heuristic argument by factorization / analysis modulo fixed primes / dividing by common factors. It’s pretty easy to see why that happens, the heuristic argument seems like a good prima facie guess, and it’s easy to see how to adjust it once you notice new arguments.
It so happens that I asked a relevant question on MathOverflow recently. Very difficult to find any elementary explanation for this phenomenon.
That’s another great example.
Thinking through the heuristic argument:
About 1log(n) of the numbers are prime and have value −1.
As we multiply by more and more factors, λ should alternate between ±1.
But starting at −1 should introduce a bias towards −1.
It’s not clear how big the bias would be. A natural guess would be in the ballpark of 1log(n).
A bit more precisely, I think the number of prime factors is roughly 1 plus a poisson with mean loglogn. (I’m just getting that from some random google results, e.g. here. But it seems intuitively right: each number k has a probability of about 1k of dividing n and a probability of about 1logk of being prime, and the sum of 1klogk is loglogn.)
Poisson distributions are more likely to be even than odd (and we’re adding 1 so more likely to be odd than even). This computation gives you a bias of order e−2λ. Putting in λ=loglogn we get a bias of order 1log(n)2.
That’s still way bigger than the real bias 1√n.
So tentatively I’m pretty surprised by how close this is to 50-50. I do feel like I’m probably overlooking something or messed up that argument. If not then this is a bit of a different situation from the other surprises, since there’s a heuristic argument suggesting a big bias that we don’t actually see. It would be good to understand what’s up here.
(From my perspective this is probably the more interesting kind of failure, since it suggests that this is a case where the heuristic argument really “was wrong” instead of merely overlooking something that could be pointed out by a more sophisticated argument.)
The Poisson approximation is not good: it’s actually a known theorem that the number of prime factors of N for N large behaves like it’s distributed normally with mean loglogN+O(1) and standard deviation (1+o(1))√loglogN.
Since the whole even/odd distinction relies crucially on the Poisson approximation, it also fails here. (This part is incorrect, the distinction between Poisson and normal doesn’t matter in this case because both approximations are too low resolution to be useful. See my next comment in the thread.)It’s easy to give a heuristic justification for this: the basic idea is that if you look at an interval [N,2N] for N large, then the indicator functions 1p:[N,2N]→{0,1} which take the value 1 if p divides the input and 0 otherwise are all “independent enough” for, say, p≤N1/4.
Now, you want to know the distribution of
number of prime divisors(x)=∑p≤2N1p(x)
Primes between N1/4 and 2N don’t contribute much to this sum, because the sum of the reciprocals of the primes scales like loglogN, so you can pretend they don’t exist. When you do that, the remaining indicator functions are all uncorrelated enough with each other thanks to the Chinese remainder theorem that you can apply some version of the central limit theorem to conclude this random variable defined on [N,2N] basically behaves as if it’s normally distributed. It’s easy to check both its mean and variance are ∼loglogN.
That’s the same heuristic argument I was imagining for why it would be Poisson (except I wasn’t being careful and thinking about CRT, or explicitly cutting out the large primes, just naively assuming independence by default). When mean and variance are equal, isn’t a Poisson distribution extremely close to a normal distribution? Or put differently: if you are adding up coins whose probability of heads is close to 0, I’d expect the poisson approximation to be as good as the normal approximation. There are higher-order differences between the two distributions, but is it specifically known to be more normal than Poisson?
I guess the big problem is that you need an extremely accurate approximation to get at the parity, and neither of these approximations is very accurate? E.g. adding 1 is a pretty small impact on the distribution, but a huge impact on the parity.
Another pass at the heuristic argument:
The number of copies of a prime p that divides x is geometric. The probability that the count is odd is 1p+1. I’d guess these events are all pretty independent for p<x1/4 based on what you said.
If we have a bunch of coins with probability 1p+1 of being odd, the probability of the sum being even is 12+12∏(1−2p+1)
That’s a bias of order 1log(x)2 again.
So it seems like the count of small prime divisors should have a very significant bias towards being even.
So I guess all the action is in the large prime divisors, and probably from the anticorrelation with the count of small prime divisors (seems like the magic is that 0 isn’t a valid answer). I guess we kind of knew that anyway, since it’s going to flip the sign of the bias. In expectation there are O(1) large prime divisors, but that’s still enough to totally dominate the calculus.
This doesn’t feel like a promising line of attack. Would be interesting to check the claim that the count of small prime divisors with multiplicities is quite even-skewed.
I agree with all of that, and I think if you simulated it you’d indeed find a large bias for the number of small prime divisors to be even. The problem is 1/log(x)2 is such a small bias in expectation that it will already be offset by just the primes in the interval [x,2x] which contributes on the order of ∼1/logx to the average in the opposite direction, since all primes trivially have an odd number of divisors.
Furthermore, there are subtleties here about exactly how much independence you need. If you want everything to be jointly independent then you can really only work with the primes up to logx while being safe—this is because the product of the primes up to x is roughly of order ex. Once you go past that, while correlations involving only a small number of primes are still fine, correlations involving lots of primes break down, and for parity problems you need to control the entire joint distribution in a fine way.
This is not a problem for the normal approximation because to show convergence in distribution to a normal distribution, you just need to show all the moments converge to the right values and use a result like Stone-Weierstrass to approximate any continuous test function uniformly by a polynomial. You can do this just by working with primes up to xα for α depending on the exact moment you’re studying. However, this result is really “low resolution”, as you correctly identify.
A couple of examples from quadratic residue patterns modulo primes:
Example 1. Let p be a large prime. How many elements x(modp) are there such that both x and x+1 are quadratic residues?
Since half of elements mod p are quadratic residues and the events ”x is a QR” and ”x+1 is a QR” look like they are independent, a reasonable guess is p/4. This is the correct main term, but what about the error? A natural square-root error term is not right: one can show that the error is O(1), the error depending only on whether p is 1 or 3 mod 4. (The proof is by elementary manipulations with the Legendre symbol, see here. So there’s hidden structure that makes the error smaller than what a naive randomness heuristic suggests.)
Example 2. Let p be a large prime. How many elements x(modp) are such that all of x,x+1 and x+2 are quadratic residues?
Again, the obvious guess for the main term is correct (there are roughly p/8 such x), so consider the error term. The error is not O(1) this time! The next guess is a square-root error term. Perhaps as p ranges over the primes, the error term is (after suitable normalization) normally distributed (as is motivated by the central limit theorems)? This is not correct either!
The error is in fact bounded in absolute value by √p/4, following from a bound on the number of points on elliptic curves modulo p. And for the distribution of the error, if one normalizes the error by dividing by √p/4 (so that the resulting value is in [−1,1]), the distribution behaves like cos(U), where U is uniformly distributed on [−π,π] as p ranges over the primes (see here). This is a deep result, which is not easy to motivate in a couple of sentences, but again there’s hidden structure that the naive randomness heuristic does not account for.
(And no, one does not get normal distribution for longer streaks of quadratic residues either.)