Jared Lichtman, a world expert on this problem, proclaimed GPT-5.4 Pro’s solution to be from The Book, perhaps the first. I certainly haven’t seen anything like it, e.g. nothing from Gavin’s thread compares. There are big-shot names below, e.g. James Maynard is a recent Fields medalist at the peak of his powers, Jacob Fox is arguably a Fields-level combinatorialist, etc:
In my doctorate, I proved the Erdős Primitive Set Conjecture, showing that the primes themselves are maximal among all primitive sets.
This problem will always be in my heart: I worked on it for 4 years (even when my mentors recommended against it!) and loved every minute of it.
[Primitive sets are a vast generalization of the prime numbers: A set S is called primitive if no number in S divides another.]
Now Erdős#1196 is an asymptotic version of Erdős’ conjecture, for primitive sets of “large” numbers. It was posed in 1966 by the Hungarian legends Paul Erdős, András Sárközy, and Endre Szemerédi.
I’d been working on it for many years, and consulted/badgered many experts about it, including my mentors Carl Pomerance and James Maynard.
The proof produced by GPT5.4 Pro was quite surprising, since it rejected the “gambit” that was implicit in all works on the subject since Erdős’ original 1935 paper. The idea to pass from analysis to probability was so natural & tempting from a human-conceptual point of view, that it obscured a technical possibility to retain (efficient, yet counter-intuitve) analytic terminology throughout, by use of the von Mangoldt function \Lambda(n).
The closest analogy I would give would be that the main openings in chess were well-studied, but AI discovers a new opening line that had been overlooked based on human aesthetics and convention.
In fact, the von Mangoldt function itself is celebrated for it’s connection to primes and the Riemann zeta function—but its piecewise definition appears to be odd and unmotivated to students seeing it for the first time. By the same token, in Erdős#1196, the von Mangoldt weights seem odd and unmotivated but turn out to cleverly encode a fundamental identity \sum_{q|n}\Lambda(q) = \log n, which is equivalent to unique factorization of n into primes. This is the exact trick that breaks the analytic issues arising in the “usual opening”.
Moreover, Terry Tao has long suspected that the applications of probability to number theory are unnecessarily complicated and this “trick” might actually clarify the general theory, which would have a broader impact than solving a single conjecture.
Although he still wouldn’t call it “original”, rather “clever”
Sure, whatever. (ETA: not sure why I wrote this. Garrett’s response is obviously correct)
Terry Tao seemed pretty excited about this, judging by his ~14 comments in the thread. Here he said:
In any case, I would indeed say that this is a situation in which the AI-generated paper inadvertently highlighted a tighter connection between two areas of mathematics (in this case, the anatomy of integers and the theory of Markov processes) than had previously been made explicit in the literature (though there were hints and precursors scattered therein which one can see in retrospect). That would be a meaningful contribution to the anatomy of integers that goes well beyond the solution of this particular Erdos problem.
r/math has for the longest time been almost constitutionally allergic to frontier AI x math progress updates. So the chatter on it there felt like quite the vibe shift in one of the last bastions of capabilities skepticism. Joshua Zelinsky, who I used to follow on math Quora back in the day, summarised it like so:
Four things to note: #1196 is a decently well known problem. It wasn’t like Erdős-Straus level fame, but it is well known enough that I was familiar with it. Second, this is not a problem where no one had worked on it; there was a lot of prior work on it and closely related problems. Third, this is not example where the AI made small modifications to things in the literature or recognized that large parts of the problem were in an obscure paper. The approach the AI used is largely a different direction than the literature on this problem went. Fourth, and closely related to three, this proof does look like parts of it will inspire subsequent proofs because it really is going in a different direction which now looks likely to be a productive line of investigation for similar problems.
I am not fond of putting words like “stunning” in titles which can be very clickbaity and feels like a hype word, but this really is in the direction where the word isn’t unreasonable even if I myself would not go so far as to use it here.
The point Jared is making seems meaningful to me. It is important to remember that a very important part of math research is not only proving theorems but also innovating on definitions and formalisms. That is, coming up with novel concepts. It indeed seems correct to operationalize “original” as “involving novel concepts” to me, and this observation should not be dismissed simply because it can be pattern-matched to arguments the ignorant make.
Update: Jared Lichtman, Terry Tao, and a few others just posted to the ArXiv this paper. Abstract:
Abstract. A set of integers is primitive if no number in the set divides another. We introduce a new method for bounding Erdős sums of primitive sets, suggested from output of GPT-5.4 Pro, based on Markov chains with von Mangoldt weights. The method leads to a host of applications, yet seems to have been overlooked by the prior literature since Erdős’ seminal 1935 paper.
As applications, we prove two 1966 conjectures of Erdős–Sárközy–Szemerédi, on primitive sets of large numbers (#1196) and on divisibility chains (#1217). The method also provides a short proof of the Erdős Primitive Set Conjecture (#164), as well as the related claim that 2 is an “Erdős-strong” prime. Moreover, the method resolves (a revised form of) the Banks–Martin conjecture, which has long been viewed as a unifying ‘master theorem’ for the area.
The acknowledgements section:
ChatGPT was used to generate code for several of the images in this paper, to search for relevant literature (for instance, in locating references for the proof of Lemma 3.2), to proofread the paper and to offer additional suggested results and remarks11, and to perform numerics to guide the proof of Lemma 3.4.
The initial proof of Theorem 1.1 was generated by an autonomous run12 of GPT-5.4 Pro; a similar run also established Theorem 1.6. GPT-5.4 Pro was also used to assist with the initial proof of Theorem 1.2, with the main human contributions being the downward divisor chain and suggesting Lemmas 3.2 and 3.3(ii) to establish the sub-invariance property. In addition, an early version of GPT-5.5 Pro was used to assist with the initial proof of Theorem 1.3. Finally, GPT-5.4 Pro helped prove Theorem 1.4. Nevertheless, the final proofs in this paper have been generated and reviewed by the human authors, using the AI-generated proofs as starting points when appropriate.
The Lean formalization in [2] was generated using OpenAI’s Codex. The Lean formalization in [33] was generated using Math Inc.’s Gauss.
Hooray, my prediction six years ago is now unambiguously correct:
“I assert that at some point in the next two years, there will exist an AI engine which when given the total body of human work in mathematics and a small prompt (like the one used in gpt-2), is capable of generating mathematical works that humans in the field find interesting to read, provided of course that someone bothers to try.”
I’m going to go celebrate by saying something else that the people around me think is dumb.
Edit for the tags: I think I was right two years ago because the (low) threshold I set was exceeded with GPT-3 or 3.5 (which was in scope for two years), but there was still room for debate. I assert that six years on, there’s no ambiguity about whether the threshold was crossed.
The thread that comment came from was contentious, I got a lot of pushback here and elsewhere during the early GPT days for my opinion that transformers would be able to output interesting math.
Two years later when 3.5 was out, I felt that my ‘interesting’ threshold had been crossed and I had been technically correct, but was still hearing the same arguments. I’m happy that six years on, we have proof that my assessment of the potential of transformers, which to be clear, was absolutely viewed as ‘evidence that this person is crazy in a way that makes me want to avoid him’, was close to accurate.
From a meta perspective, this post is probably not helping me appear sane.
The greatest praise that Paul Erdős, maybe the most prolific mathematician who ever lived, gave proofs was to proclaim them “straight from The Book”.
GPT-5.4 Pro one-shot the solution to Erdős Problem #1196 in 80 minutes (plus “another 30 ish mins to convert the solution to a latex math paper”). Math Inc later formalised it in Lean:
Jared Lichtman, a world expert on this problem, proclaimed GPT-5.4 Pro’s solution to be from The Book, perhaps the first. I certainly haven’t seen anything like it, e.g. nothing from Gavin’s thread compares. There are big-shot names below, e.g. James Maynard is a recent Fields medalist at the peak of his powers, Jacob Fox is arguably a Fields-level combinatorialist, etc:
Jared later wrote:
Although he still wouldn’t call it “original”, rather “clever”
Sure, whatever.(ETA: not sure why I wrote this. Garrett’s response is obviously correct)Terry Tao seemed pretty excited about this, judging by his ~14 comments in the thread. Here he said:
r/math has for the longest time been almost constitutionally allergic to frontier AI x math progress updates. So the chatter on it there felt like quite the vibe shift in one of the last bastions of capabilities skepticism. Joshua Zelinsky, who I used to follow on math Quora back in the day, summarised it like so:
The point Jared is making seems meaningful to me. It is important to remember that a very important part of math research is not only proving theorems but also innovating on definitions and formalisms. That is, coming up with novel concepts. It indeed seems correct to operationalize “original” as “involving novel concepts” to me, and this observation should not be dismissed simply because it can be pattern-matched to arguments the ignorant make.
Agree, it’s the thing Scholze seems to do spectacularly well for instance. Not sure why I wrote that, thanks for the callout.
Do you think the distinction Jared is making isn’t real?
Update: Jared Lichtman, Terry Tao, and a few others just posted to the ArXiv this paper. Abstract:
The acknowledgements section:
Hooray, my prediction six years ago is now unambiguously correct:
“I assert that at some point in the next two years, there will exist an AI engine which when given the total body of human work in mathematics and a small prompt (like the one used in gpt-2), is capable of generating mathematical works that humans in the field find interesting to read, provided of course that someone bothers to try.”
I’m going to go celebrate by saying something else that the people around me think is dumb.
Edit for the tags: I think I was right two years ago because the (low) threshold I set was exceeded with GPT-3 or 3.5 (which was in scope for two years), but there was still room for debate. I assert that six years on, there’s no ambiguity about whether the threshold was crossed.
I thought you were making a joke, but your edit confused me.
The thread that comment came from was contentious, I got a lot of pushback here and elsewhere during the early GPT days for my opinion that transformers would be able to output interesting math.
Two years later when 3.5 was out, I felt that my ‘interesting’ threshold had been crossed and I had been technically correct, but was still hearing the same arguments. I’m happy that six years on, we have proof that my assessment of the potential of transformers, which to be clear, was absolutely viewed as ‘evidence that this person is crazy in a way that makes me want to avoid him’, was close to accurate.
From a meta perspective, this post is probably not helping me appear sane.
It shows both that you were right qualitatively and wrong about the numbers.
It’s interesting that it one-shot it. I wonder if anyone had tried asking 5.4 Pro it before, and if so how often it produces this proof.
I recall that after 5.4 Pro solved the first Epoch open problem, Epoch found that Opus 4.6 and Gemini 3.1 could both also solve it.
GPT-5.4 Pro replicated this 8 times out of 10 when Arb Research tried it.