Very little / hard to evaluate. I have been doing my best to carefully avoid saying things like “do math/science research”, unless speaking really loosely, because I believe that’s quite a poor category. It’s like “programming”; sure, there’s a lot in common between writing a CRUD app or tweaking a UI, but it’s really not the same thing as “think of a genuinely novel algorithm and implement it effectively in context”. Quoting from https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce#_We_just_need_X__intuitions :
Most instances of a category are not the most powerful, most general instances of that category. So just because we have, or will soon have, some useful instances of a category, doesn’t strongly imply that we can or will soon be able to harness most of the power of stuff in that category. I’m reminded of the politician’s syllogism: “We must do something. This is something. Therefore, we must do this.”.
So the outcome of “think of a novel math concept that mathematicians then find interesting” is nontrivially more narrow than “prove something nontrivial that hasn’t been proved before”. I haven’t evaluated the actual results though and don’t know how they work. If mathematicians starting reading off lots of concepts directly from the LLMs’s reasonings and find them interesting and start using them in further theory, that would be surprising / alarming / confusing, yeah—though it also wouldn’t be mindblowing if it turns out that “a novel math concept that mathematicians then find interesting” is also a somewhat poor category in the same way, and actually there are such things to be found without it being much of a step toward general intelligence.
A bit of a necrocomment, but I’d like to know if LLMs solving unsolved math problems has changed your mind.
Erdos problems 205 and 1051: AI contributions to Erdős problems · teorth/erdosproblems Wiki. Note: I don’t know what LLM Aristotle is based on, but Aletheia is based on Gemini.
Also this paper: [2512.14575] Extremal descendant integrals on moduli spaces of curves: An inequality discovered and proved in collaboration with AI
Very little / hard to evaluate. I have been doing my best to carefully avoid saying things like “do math/science research”, unless speaking really loosely, because I believe that’s quite a poor category. It’s like “programming”; sure, there’s a lot in common between writing a CRUD app or tweaking a UI, but it’s really not the same thing as “think of a genuinely novel algorithm and implement it effectively in context”. Quoting from https://www.lesswrong.com/posts/sTDfraZab47KiRMmT/views-on-when-agi-comes-and-on-strategy-to-reduce#_We_just_need_X__intuitions :
So the outcome of “think of a novel math concept that mathematicians then find interesting” is nontrivially more narrow than “prove something nontrivial that hasn’t been proved before”. I haven’t evaluated the actual results though and don’t know how they work. If mathematicians starting reading off lots of concepts directly from the LLMs’s reasonings and find them interesting and start using them in further theory, that would be surprising / alarming / confusing, yeah—though it also wouldn’t be mindblowing if it turns out that “a novel math concept that mathematicians then find interesting” is also a somewhat poor category in the same way, and actually there are such things to be found without it being much of a step toward general intelligence.