paulfchristiano comments on The Goldbach conjecture is probably correct; so was Fermat’s last theorem

paulfchristiano 6 Mar 2022 23:32 UTC
LW: 22 AF: 12
AF
I am extremely interested in examples of heuristic arguments like this where the naive version leads you astray and it takes much more sophisticated arguments to fix the problem. I’d be happy to pay a $500 bounty for any examples that I find really interesting, in number theory or elsewhere. (Usual disclaimers, my judgment of what counts as interesting will be completely arbitrary and unaccountable, etc.)
The best example I found was the Chebyshev bias that primes are very slightly more likely to be 3 mod 4 than 1 mod 4. The simplest explanation I know of this goes through an explicit formula for the von Mangoldt function’s distribution mod 4 as explained by Terry Tao here.
(Roughly speaking: the prime powers should be uniform mod 4, based on an argument about L-functions that I don’t understand. But the prime squares are all 1 mod 4, so the primes need to be 3 mod 4 slightly more often to cancel out. It’s very surprising that it works this way—if you made me guess I would have quite strongly predicted that the prime powers would skew towards 1 mod 4, instead of the primes accommodating the squares like that.)
That post also describes another bias where prime gaps are surprisingly non-uniform, though once it’s pointed out that one feels much more obvious to me.
Another example is that the strong form of Cramer’s conjecture is false, i.e. the prime gaps are not as consistently small as you would expect based on a simple random model. I don’t have a good understanding of why this happens, people don’t seem shocked by it (but right now I feel kind of shocked).
I’m not really interested in diophantine equations where you can override a simple heuristic argument by factorization / analysis modulo fixed primes / dividing by common factors. It’s pretty easy to see why that happens, the heuristic argument seems like a good prima facie guess, and it’s easy to see how to adjust it once you notice new arguments.
- Ege Erdil 7 Mar 2022 14:14 UTC
  10 points
  Parent
  It so happens that I asked a relevant question on MathOverflow recently. Very difficult to find any elementary explanation for this phenomenon.
  - paulfchristiano 8 Mar 2022 5:40 UTC
    3 points
    Parent
    That’s another great example.
    Thinking through the heuristic argument:
    About $\frac{1}{log (n)}$ of the numbers are prime and have value $- 1$ .
    As we multiply by more and more factors, $λ$ should alternate between $\pm 1$ .
    But starting at $- 1$ should introduce a bias towards $- 1$ .
    It’s not clear how big the bias would be. A natural guess would be in the ballpark of $\frac{1}{log (n)}$ .
    A bit more precisely, I think the number of prime factors is roughly $1$ plus a poisson with mean $log log n$ . (I’m just getting that from some random google results, e.g. here. But it seems intuitively right: each number $k$ has a probability of about $\frac{1}{k}$ of dividing $n$ and a probability of about $\frac{1}{log k}$ of being prime, and the sum of $\frac{1}{k log k}$ is $log log n$ .)
    Poisson distributions are more likely to be even than odd (and we’re adding 1 so more likely to be odd than even). This computation gives you a bias of order $e^{- 2 λ}$ . Putting in $λ = log log n$ we get a bias of order $\frac{1}{log (n)^{2}}$ .
    That’s still way bigger than the real bias $\frac{1}{\sqrt{n}}$ .
    So tentatively I’m pretty surprised by how close this is to 50-50. I do feel like I’m probably overlooking something or messed up that argument. If not then this is a bit of a different situation from the other surprises, since there’s a heuristic argument suggesting a big bias that we don’t actually see. It would be good to understand what’s up here.
    (From my perspective this is probably the more interesting kind of failure, since it suggests that this is a case where the heuristic argument really “was wrong” instead of merely overlooking something that could be pointed out by a more sophisticated argument.)
    - Ege Erdil 8 Mar 2022 11:57 UTC
      3 points
      Parent
      The Poisson approximation is not good: it’s actually a known theorem that the number of prime factors of $N$ for $N$ large behaves like it’s distributed normally with mean $log log N + O (1)$ and standard deviation $(1 + o (1)) \sqrt{log log N}$ . ~~Since the whole even/odd distinction relies crucially on the Poisson approximation, it also fails here~~. (This part is incorrect, the distinction between Poisson and normal doesn’t matter in this case because both approximations are too low resolution to be useful. See my next comment in the thread.)
      
      It’s easy to give a heuristic justification for this: the basic idea is that if you look at an interval $[N, 2 N]$ for $N$ large, then the indicator functions $1_{p} : [N, 2 N] \to {0, 1}$ which take the value $1$ if $p$ divides the input and $0$ otherwise are all “independent enough” for, say, $p \leq N^{1 / 4}$ .
      
      Now, you want to know the distribution of
      
      $number of prime divisors (x) = \sum p \leq 2 N 1_{p} (x)$
      
      Primes between $N^{1 / 4}$ and $2 N$ don’t contribute much to this sum, because the sum of the reciprocals of the primes scales like $log log N$ , so you can pretend they don’t exist. When you do that, the remaining indicator functions are all uncorrelated enough with each other thanks to the Chinese remainder theorem that you can apply some version of the central limit theorem to conclude this random variable defined on $[N, 2 N]$ basically behaves as if it’s normally distributed. It’s easy to check both its mean and variance are $\sim log log N$ .
      - paulfchristiano 8 Mar 2022 16:17 UTC
        3 points
        Parent
        That’s the same heuristic argument I was imagining for why it would be Poisson (except I wasn’t being careful and thinking about CRT, or explicitly cutting out the large primes, just naively assuming independence by default). When mean and variance are equal, isn’t a Poisson distribution extremely close to a normal distribution? Or put differently: if you are adding up coins whose probability of heads is close to 0, I’d expect the poisson approximation to be as good as the normal approximation. There are higher-order differences between the two distributions, but is it specifically known to be more normal than Poisson?
        I guess the big problem is that you need an extremely accurate approximation to get at the parity, and neither of these approximations is very accurate? E.g. adding 1 is a pretty small impact on the distribution, but a huge impact on the parity.
        Another pass at the heuristic argument:
        The number of copies of a prime $p$ that divides $x$ is geometric. The probability that the count is odd is $\frac{1}{p + 1}$ . I’d guess these events are all pretty independent for $p < x^{1 / 4}$ based on what you said.
        If we have a bunch of coins with probability $\frac{1}{p + 1}$ of being odd, the probability of the sum being even is $\frac{1}{2} + \frac{1}{2} \prod (1 - \frac{2}{p + 1})$
        That’s a bias of order $\frac{1}{log (x)^{2}}$ again.
        So it seems like the count of small prime divisors should have a very significant bias towards being even.
        So I guess all the action is in the large prime divisors, and probably from the anticorrelation with the count of small prime divisors (seems like the magic is that 0 isn’t a valid answer). I guess we kind of knew that anyway, since it’s going to flip the sign of the bias. In expectation there are O(1) large prime divisors, but that’s still enough to totally dominate the calculus.
        This doesn’t feel like a promising line of attack. Would be interesting to check the claim that the count of small prime divisors with multiplicities is quite even-skewed.
        Ege Erdil 8 Mar 2022 16:50 UTC
        3 points
        Parent
        I agree with all of that, and I think if you simulated it you’d indeed find a large bias for the number of small prime divisors to be even. The problem is $1 / log (x)^{2}$ is such a small bias in expectation that it will already be offset by just the primes in the interval $[x, 2 x]$ which contributes on the order of $\sim 1 / log x$ to the average in the opposite direction, since all primes trivially have an odd number of divisors.
        
        Furthermore, there are subtleties here about exactly how much independence you need. If you want everything to be jointly independent then you can really only work with the primes up to $log x$ while being safe—this is because the product of the primes up to $x$ is roughly of order $e^{x}$ . Once you go past that, while correlations involving only a small number of primes are still fine, correlations involving lots of primes break down, and for parity problems you need to control the entire joint distribution in a fine way.
        
        This is not a problem for the normal approximation because to show convergence in distribution to a normal distribution, you just need to show all the moments converge to the right values and use a result like Stone-Weierstrass to approximate any continuous test function uniformly by a polynomial. You can do this just by working with primes up to $x^{α}$ for $α$ depending on the exact moment you’re studying. However, this result is really “low resolution”, as you correctly identify.
- Olli Järviniemi 7 Dec 2022 13:54 UTC
  4 points
  Parent
  A couple of examples from quadratic residue patterns modulo primes:
  Example 1. Let $p$ be a large prime. How many elements $x (mod p)$ are there such that both $x$ and $x + 1$ are quadratic residues?
  Since half of elements mod $p$ are quadratic residues and the events ” $x$ is a QR” and ” $x + 1$ is a QR” look like they are independent, a reasonable guess is $p / 4$ . This is the correct main term, but what about the error? A natural square-root error term is not right: one can show that the error is $O (1)$ , the error depending only on whether $p$ is $1$ or $3$ mod $4$ . (The proof is by elementary manipulations with the Legendre symbol, see here. So there’s hidden structure that makes the error smaller than what a naive randomness heuristic suggests.)
  Example 2. Let $p$ be a large prime. How many elements $x (mod p)$ are such that all of $x, x + 1$ and $x + 2$ are quadratic residues?
  Again, the obvious guess for the main term is correct (there are roughly $p / 8$ such $x$ ), so consider the error term. The error is not $O (1)$ this time! The next guess is a square-root error term. Perhaps as $p$ ranges over the primes, the error term is (after suitable normalization) normally distributed (as is motivated by the central limit theorems)? This is not correct either!
  The error is in fact bounded in absolute value by $\sqrt{p} / 4$ , following from a bound on the number of points on elliptic curves modulo $p$ . And for the distribution of the error, if one normalizes the error by dividing by $\sqrt{p} / 4$ (so that the resulting value is in $[- 1, 1]$ ), the distribution behaves like $cos (U)$ , where $U$ is uniformly distributed on $[- π, π]$ as $p$ ranges over the primes (see here). This is a deep result, which is not easy to motivate in a couple of sentences, but again there’s hidden structure that the naive randomness heuristic does not account for.
  (And no, one does not get normal distribution for longer streaks of quadratic residues either.)