# The Goldbach conjecture is probably correct; so was Fermat’s last theorem

EDIT: Added a sec­tion on Euler’s con­jec­ture.

# The Gold­bach con­jec­ture is likely

The Gold­bach con­jec­ture is that “ev­ery even in­te­ger above two is the sum of two primes”. For ex­am­ple, , , , and so on.

Though this is a math­e­mat­i­cally pre­cise state­ment, we can talk about the “prob­a­bil­ity” of it be­gin cor­rect. How so?

Well, by the prime num­ber the­o­rem, the prob­a­bil­ity of a ran­dom num­ber less than be­ing prime, is . So if we sum up all the primes less than , we get differ­ent sums; these sums will be less than .

So, is it­self is one of these sums? Well, the “prob­a­bil­ity” that it’s not the to­tal of any given sum is ; there­fore the prob­a­bil­ity of it be­ing the to­tal of none of the sums is:

So the prob­a­bil­ity of be­ing the to­tal of such a sum is roughly:

There­fore, the prob­a­bil­ity of all num­bers be­ing the to­tal of such a sum is roughly:

Now, the in­finite product con­verges to a non-zero num­ber if and only if the sum con­verges to a finite num­ber. That se­ries can be seen to be con­ver­gent (for ex­am­ple, by not­ing that for large enough and us­ing the com­par­i­son test).

If use com­put­ers to get an es­ti­mate of , we get a pretty low prob­a­bil­ity. How­ever, most of that im­prob­a­bil­ity mass is on the low num­bers, and the Gold­bach con­jec­ture has been tested up to . So, if we as­sume it’s valid up to , we nu­mer­i­cally get:

So the Gold­bach con­jec­ture is pretty likely, and, the more ex­am­ples we dis­cover where it holds, the more likely it is to hold all the way to in­finity.

## “Prob­a­bil­ities” of log­i­cal facts

The above rea­son­ing seems du­bi­ous. The primes are not defined by ran­dom sam­pling among the nat­u­ral num­bers; quite to the con­trary, they come from a math­e­mat­i­cal rule of ex­treme pre­ci­sion. So what do these prob­a­bil­ities mean?

Let be an in­finite set of num­bers, se­lected from the nat­u­ral num­bers in a way that looks like the prime num­ber the­o­rem (eg the -th num­ber is ap­prox­i­mately ). Then what we’ve shown is that, if such an obeys the “-Gold­bach con­jec­ture” up to , then we’d ex­pect it to go all the way to in­finity.

Thus the Gold­bach con­jec­ture can be restated as “in terms of sums of two el­e­ments, the prime num­bers be­have like a typ­i­cal se­quence se­lected in a prime-num­ber-the­o­rem way”.

So the Gold­bach con­jec­ture is not say­ing that there is some­thing spe­cial about the primes; in fact, it’s say­ing the op­po­site, that the primes are typ­i­cal of similar se­quences, that noth­ing in the spe­cific ways that the primes are se­lected has an im­pact on the sum of two primes. So the Gold­bach con­jec­ture is es­sen­tially say­ing “there is no ob­struc­tion to the primes be­ing typ­i­cal in this way”.

## One obstruction

Did you no­tice that, so far, at no point did I re­quire to be an even num­ber? But all the primes ex­cept for are odd. So the dis­tri­bu­tion of sums of primes is very (very!) heav­ily skewed to­wards even num­bers; most odd num­bers will not ap­pear at all. So, that is one clear ob­struc­tion to the pos­si­ble val­ues of the sum, com­ing from the way the primes are con­structed. The Gold­bach con­jec­ture is there­fore say­ing that there are no ad­di­tional ob­struc­tions be­yond this one con­di­tion on par­ity.

In fact, the Gold­bach con­jec­ture has changed; used to be seen as a prime num­ber, and the origi­nal con­jec­ture in­cluded as an­other ex­am­ple Then was re­moved from the list of prime num­bers, and it turned out, as far as we can tell, that was the only even num­ber we lost from the list of sums.

If we re­moved from the list of primes, we’d only lose as a sum. Similarly, if we strike out the first primes, we ex­pect—on prob­a­bil­is­tic grounds—that “all num­bers greater than a given are the sums of two primes (first primes not in­cluded)”. If that were to fail, then there’s a re­ally in­ter­est­ing ob­struc­tion out there.

# Fer­mat’s last the­o­rem was likely (for n>3)

We can show, similarly, that Fer­mat’s last the­o­rem was very likely on prob­a­bil­is­tic grounds. The the­o­rem states that, for , there do not ex­ist nat­u­ral num­bers such that .

Fix and . Count­ing and , there are nat­u­ral num­bers less than or equal to . There­fore there are pos­si­ble , all less than . So the prob­a­bil­ity that any two of these -th pow­ers sum to is .

So the prob­a­bil­ity that there are no ’s such that , is

The sum con­verges. More­over, we can also sum over : . This also con­verges. So the prob­a­bil­ity of Fer­mat’s last the­o­rem was non-zero, at least for ; add on the fact that the the­o­rem was proved for many and checked for many and , means that, even be­fore it was proved, it was very prob­a­ble it was cor­rect.

So An­drew Wiles’s ge­nius was in show­ing there were no un­ex­pected ob­struc­tions for the “likely” out­come to be true. That’s why the proof is so hard: he was try­ing to prove some­thing very “likely”, and show an ab­sence of struc­ture, rather than a pres­ence, with­out know­ing what that struc­ture could be.

# Euler’s con­jec­ture was unlikely

Euler’s con­jec­ture was that you needed to sum at least pow­ers of to get an­other power of ; Fer­mat’s last the­o­rem es­tab­lishes this for , and Euler the­o­rised that this ex­tended.

Euler’s the­o­rem is in fact false; for we have three fourth pow­ers that sum to an­other fourth power as:

There are coun­terex­am­ples known for as well, so the con­jec­ture is false, and not just for one value of .

More in­ter­est­ing from our per­spec­tive, we ex­pect it to be false on prob­a­bil­is­tic grounds. Re­call that the ar­gu­ment about Fer­mat’s last the­o­rem does not work for ; it fails be­cause the cru­cial sum is of the type , which di­verges.

Similarly, if we es­ti­mate the prob­a­bil­ity of Euler’s con­jec­ture, we get terms like the fol­low­ing (for some con­stants ):

This goes to zero for the same rea­son as the case.

So, on prob­a­bil­is­tic grounds, we ex­pect Fer­mat’s last the­o­rem to be true for , and we ex­pect Euler’s con­jec­ture to be false.

The only un­ex­pected re­sult here is that Fer­mat’s last the­o­rem and Euler’s con­jec­ture are true for . So some­thing about the struc­ture of the prob­lem for is mov­ing the re­sult away from the prob­a­bil­is­tic out­come.

# The “Stu­art con­jec­ture”

Based on what I wrote about Euler’s con­jec­ture, I’ll haz­ard a con­jec­ture my­self, which I be­lieve to be true on prob­a­bil­is­tic grounds. Namely that if there are in­te­gers, whose -th pow­ers sum non-triv­ially to an­other -th power, then is greater than or equal to .

Fer­mat’s last the­o­rem shows this is true for and .

• Rem­i­nis­cent of Free­man Dyson’s 2005 an­swer to the ques­tion: “what do you be­lieve is true even though you can­not prove it?”:

Since I am a math­e­mat­i­cian, I give a pre­cise an­swer to this ques­tion. Thanks to Kurt Gödel, we know that there are true math­e­mat­i­cal state­ments that can­not be proved. But I want a lit­tle more than this. I want a state­ment that is true, un­prov­able, and sim­ple enough to be un­der­stood by peo­ple who are not math­e­mat­i­ci­ans. Here it is.
Num­bers that are ex­act pow­ers of two are 2, 4, 8, 16, 32, 64, 128 and so on. Num­bers that are ex­act pow­ers of five are 5, 25, 125, 625 and so on. Given any num­ber such as 131072 (which hap­pens to be a power of two), the re­verse of it is 270131, with the same digits taken in the op­po­site or­der. Now my state­ment is: it never hap­pens that the re­verse of a power of two is a power of five.
The digits in a big power of two seem to oc­cur in a ran­dom way with­out any reg­u­lar pat­tern. If it ever hap­pened that the re­verse of a power of two was a power of five, this would be an un­likely ac­ci­dent, and the chance of it hap­pen­ing grows rapidly smaller as the num­bers grow big­ger. If we as­sume that the digits oc­cur at ran­dom, then the chance of the ac­ci­dent hap­pen­ing for any power of two greater than a billion is less than one in a billion. It is easy to check that it does not hap­pen for pow­ers of two smaller than a billion. So the chance that it ever hap­pens at all is less than one in a billion. That is why I be­lieve the state­ment is true.
But the as­sump­tion that digits in a big power of two oc­cur at ran­dom also im­plies that the state­ment is un­prov­able. Any proof of the state­ment would have to be based on some non-ran­dom prop­erty of the digits. The as­sump­tion of ran­dom­ness means that the state­ment is true just be­cause the odds are in its fa­vor. It can­not be proved be­cause there is no deep math­e­mat­i­cal rea­son why it has to be true. (Note for ex­perts: this ar­gu­ment does not work if we use pow­ers of three in­stead of pow­ers of five. In that case the state­ment is easy to prove be­cause the re­verse of a num­ber di­visi­ble by three is also di­visi­ble by three. Divisi­bil­ity by three hap­pens to be a non-ran­dom prop­erty of the digits).
It is easy to find other ex­am­ples of state­ments that are likely to be true but un­prov­able. The es­sen­tial trick is to find an in­finite se­quence of events, each of which might hap­pen by ac­ci­dent, but with a small to­tal prob­a­bil­ity for even one of them hap­pen­ing. Then the state­ment that none of the events ever hap­pens is prob­a­bly true but can­not be proved.
• How does the ran­dom­ness of the digits im­ply that the state­ment can­not be proven? Su­perfi­cially the quote seems to use two differ­ent no­tions of ran­dom­ness, namely “we can­not de­tect any pat­terns” (i.e. a pure ran­dom gen­er­a­tor is the best pre­dic­tor we have) and “we have shown that there can be no pat­terns” (i.e. we have shown no other pre­dic­tor can do any bet­ter). Is this a known re­sult from Er­godic The­ory?

• I was about to type out a re­but­tal of this, but halfway through I re­al­ized I ac­tu­ally agree with you. The “some non-ran­dom prop­erty” of the digits of the pow­ers of two is that they are all digits found in or­der in­side of pow­ers of two. I would even go so far as to say that if the state­ment re­ally can’t be proven (even tak­ing into ac­count the fact that the digits aren’t truly ran­dom) then there’s a sense in which it isn’t true. (And if it can’t be proven false, then I’d also say it isn’t false.)

• I don’t think it’s a fair de­duc­tion to con­clude that Gold­bach’s con­jec­ture is “prob­a­bly true” based on a es­ti­mate of the mea­sure (or prob­a­bil­ity) of the set of pos­si­ble counter ex­am­ples be­ing small. The con­jec­ture is ei­ther true or false, but more to the point I think you are us­ing the words prob­a­bil­ity and prob­a­ble in two differ­ent ways (the mea­sure the­o­retic sense, and in the sense of un­cer­tainty about the truth value of a state­ment), which obfus­cates (at least to me) what ex­actly the con­clu­sion of your ar­gu­ment is.

There is of course a case to be made about whether it mat­ters if Gold­bach’s con­jec­ture should be con­sid­ered as true if the first counter ex­am­ple is larger than an num­ber that could pos­si­bly and rea­son­able man­i­fest in phys­i­cal re­al­ity. Maybe this was what you are get­ting at, and I don’t re­ally have a strong or well thought out opinion ei­ther way on this.

Lastly, I won­der whether there are ex­am­ples of heuris­tic calcu­la­tions which make the wrong pre­dic­tion about the truth value of the con­jec­ture to which they per­tain. I’m spit­bal­ling here, but it would be in­ter­est­ing to see what the heuris­tics for Fer­mat’s the­o­rem say about Euler’s sum of primes con­jec­ture (of which Fer­mat’s last the­o­rem is K = 2 case), since we know that the con­jec­ture is false for K = 4. More speci­fi­cally, how can we tell a good heuris­tic from a bad one? I’m not sure, and I also don’t mean to im­ply that heuris­tics are use­less, more that maybe they are use­ful be­cause they (i) give one an idea of whether to try to prove some­thing or look for a counter ex­am­ple, and (ii) give a rough idea of why some­thing should be true or false, and what di­rec­tion a proof should go in (e.g. for Gold­bach’s con­jec­ture, it seems like one needs to have pre­cise state­ments about how the primes be­have like ran­dom num­bers).

• Note that the prob­a­bil­is­tic ar­gu­ment fails for n=3 for Fer­mat’s last the­o­rem; call this (3,2) (power=3, num­ber of sum­mands is 2).

So we know (3,2) is im­pos­si­ble; Euler’s con­jec­ture is the equiv­a­lent of say­ing that (n+1,n) is also im­pos­si­ble for all n. How­ever, the prob­a­bil­is­tic ar­gu­ment fails for (n+1,n) the same way as it fails for (3,2). So we’d ex­pect Euler’s con­jec­ture to fail, on prob­a­bil­is­tic grounds.

In fact, the sur­pris­ing thing on prob­a­bil­is­tic grounds is that Fer­mat’s last the­o­rem is true for n=3.

• There is of course a case to be made about whether it mat­ters if Gold­bach’s con­jec­ture should be con­sid­ered as true if the first counter ex­am­ple is larger than an num­ber that could pos­si­bly and rea­son­able man­i­fest in phys­i­cal re­al­ity. Maybe this was what you are get­ting at, and I don’t re­ally have a strong or well thought out opinion ei­ther way on this.

That would be suffi­cient, but there are more eas­ily met con­di­tions like:

1)

if you have a bunch of se­quences of ob­ser­va­tions (100 or 1000 year’s worth)

and none of them in­clude the coun­terex­am­ple as an ob­ser­va­tion with high probability

2)

The fre­quency is low enough that it isn’t worth ac­count­ing for.

For ex­am­ple, if you flip a coin it comes up heads or tails—is it worth bring­ing up an­other pos­si­bil­ity?

• So An­drew Wiles’s ge­nius was in show­ing there were no un­ex­pected ob­struc­tions for the “likely” out­come to be true. That’s why the proof is so hard: he was try­ing to prove some­thing very “likely”, and show an ab­sence of struc­ture, rather than a pres­ence, with­out know­ing what that struc­ture could be.

This is a poor de­scrip­tion of Wiles’s proof; in fact, I would call it di­a­met­ri­cally wrong. Wiles proved the pres­ence of a very rigid struc­ture—not the ab­sence—and the pres­ence of this struc­ture im­plied FLT via the work of other math­e­mat­i­ci­ans.

I don’t have a great un­der­stand­ing of the broader point you are mak­ing, so I don’t know how big an is­sue this mis­take pre­sents. How­ever, be aware that the paradigm you’ve ex­tracted from the ideas in this post has lead to at least one in­cor­rect pre­dic­tion.

I’ll try to ex­plain how Wiles’s proof di­verges from your model of it by way of anal­ogy. Sup­pose that we in­stead wanted to prove Fer­mat’s first the­o­rem:

Fer­mat’s first the­o­rem: For ev­ery even in­te­ger there are no non­triv­ial in­te­ger solu­tions to the equa­tion .

Fur­ther sup­pose that in our world, math­e­mat­i­ci­ans know about the no­tion of pos­i­tivity and ab­solute val­ues, but the proof of the fol­low­ing fact has long evaded them.

Pos­i­tivity con­jec­ture: For ev­ery in­te­ger , we have .

The pos­i­tivity con­jec­ture is a very im­por­tant struc­tural fact about the in­te­gers. And it im­me­di­ately im­plies Fer­mat’s first the­o­rem (since the left-hand side must be pos­i­tive, but the right-hand side must be nega­tive un­less x,y,z are all 0). So Fer­mat’s first the­o­rem fol­lows from an im­por­tant struc­tural fact.

How­ever, in our sup­posed world, math­e­mat­i­ci­ans don’t have ac­cess to the pos­i­tivity con­jec­ture. They might perform the ex­act same anal­y­sis in your post (it goes through ver­ba­tim!), and con­clude that if you check Fer­mat’s first the­o­rem for enough n, then it is prob­a­ble to be true. How­ever, it is not true that the proof of FFT via the pos­i­tivity con­jec­ture is “prov­ing an ab­sence of struc­ture”—quite the op­po­site!

The analogue of the pos­i­tivity con­jec­ture in the real world is the Mo­du­lar­ity the­o­rem. This is what Wiles ac­tu­ally proved, and it was already known that the Mo­du­lar­ity the­o­rem im­plies FLT. And as with the pos­i­tivity con­jec­ture, the Mo­du­lar­ity the­o­rem is a very pow­er­ful struc­tural re­sult. To give a slo­gan, it says that ev­ery el­lip­tic curve over is “mod­u­lar,” mean­ing that it re­lates in an ap­pro­pri­ate way to an ob­ject called a mod­u­lar form.

• Wiles proved the pres­ence of a very rigid struc­ture—not the ab­sence—and the pres­ence of this struc­ture im­plied FLT via the work of other math­e­mat­i­ci­ans.

If you say that “Wiles proved the Taniyama–Shi­mura con­jec­ture” (for semistable el­lip­tic curves), then I agree: he’s proved a very im­por­tant struc­tural re­sult in math­e­mat­ics.

If you say he proved Fer­mat’s last the­o­rem, then I’d say he’s proved an im­por­tant-but-prob­a­ble lack of struc­ture in math­e­mat­ics.

So yeah, he proved the ex­is­tence of struc­ture in one area, and (hence) the ab­sence of struc­ture in an­other area.

And “to prove Fer­mat’s last the­o­rem, you have to go via prov­ing the Taniyama–Shi­mura con­jec­ture”, is, to my mind, strong ev­i­dence for “prov­ing lack of struc­ture is hard”.

• I’m not sure whats go­ing on with the small num­bers though. When I use the for­mula to try and get the prob­a­bil­ity of gold­bachs con­jec­ture hold­ing for all num­bers >2 , I get around 10^-15.

More so­phis­ti­cated for­mu­lae that take into ac­count the even odd pat­tern still get around 10^-6. Of course, this is sen­si­tively de­pen­dant on ex­actly where you start. But it seems that this prob­a­bil­is­tic rea­son­ing show­ing so strongly that a small even num­ber shouldn’t be a sum of primes in­di­cates a prob­lem with the prob­a­bil­is­tic model.

• Yes, I got that re­sult too. The prob­lem is that the prime num­ber the­o­rem isn’t a very good ap­prox­i­ma­tion for small num­bers. So we’d need a slightly more so­phis­ti­cated model that has more low num­bers.

I sus­pect that mov­ing from “sam­pling with re­place­ment” to “sam­pling with­out re­place­ment” might be enough for low num­bers, though.

• For a more pre­cise prob­a­bil­is­tic ap­proach to Fer­mat’s last the­o­rem by Feyn­man, which take into ac­count the repar­ti­tion of the nth pow­ers, see this amaz­ing ar­ti­cle http://​​www.lbatalha.com/​​blog/​​feyn­man-on-fer­mats-last-theorem

• Thanks! It’s cool to see his ap­proach.

• Since a+b = b+a shouldn’t the to­tal num­ber of ‘differ­ent sums’ be half of what you give? For­tu­nately the rest of the ar­gu­ment works com­pletely analo­gously.

• You can see this as sam­pling times sorta-in­de­pen­dently, or as sam­pling times with less in­de­pen­dence (ie most sums are sam­pled twice).

Either view works, and as you said, it doesn’t change the out­come.