It may or may not be the first steps toward foom, but automated improvements are still improvements regardless of how “innovative-in-themselves” we consider them to be. Improving on an algorithm that’s been the SOTA since 1969 is cool, even if it was done purely via brute force.
For now, it looks like it “only” found minor improvements on various SOTA, but this was done with previous generation models (a mix of Gemini 2.0 Flash and Pro)[1]. I’d expect next-gen models and next-gen scaffolds to be another step up.
Models used.AlphaEvolve employs an ensemble of large language models. Specifically, we utilize a combination of Gemini 2.0 Flash and Gemini 2.0 Pro. This ensemble approach allows us to balance computational throughput with the quality of generated solutions.
The paper does not display any capabilities we’ve previously been unaware of. “AI produces innovations”, it’s touted as, as if AI leveraged research taste and creativity to improve on the human state-of-the-art; as if the henceforth-unattained holy grail of LLMs-reliably-producing-innovations has finally been found.
But in actuality, it’s “LLM straightforwardly optimizes/improves a codebase that transforms compute into improvements to the SOTA”. We already knew LLMs can do that sometimes.
The issue isn’t that the improvements are minor. It’s that the AIs’ ability to make those improvements in this setting has ~nothing to do with AIs’ ability to output innovations in other settings. It’s not really a conceptual-research task.
Granted: The steelman here is that this setting is also the setting of DL research, so this can potentially lead to RSI… But now we run into a Catch-22. If LLMs are in fact not capable of reliably finding nontrivial open-domain discoveries, if they could only do so in those limited settings, then no realistic amount of “recursive self-improvement” of LLMs would result in an actual Singularity.
It may or may not be the first steps toward foom, but automated improvements are still improvements regardless of how “innovative-in-themselves” we consider them to be. Improving on an algorithm that’s been the SOTA since 1969 is cool, even if it was done purely via brute force.
For now, it looks like it “only” found minor improvements on various SOTA, but this was done with previous generation models (a mix of Gemini 2.0 Flash and Pro)[1]. I’d expect next-gen models and next-gen scaffolds to be another step up.
Models used. AlphaEvolve employs an ensemble of large language models. Specifically, we utilize a combination of Gemini 2.0 Flash and Gemini 2.0 Pro. This ensemble approach allows us to balance computational throughput with the quality of generated solutions.
You’re not wrong, but...
The paper does not display any capabilities we’ve previously been unaware of. “AI produces innovations”, it’s touted as, as if AI leveraged research taste and creativity to improve on the human state-of-the-art; as if the henceforth-unattained holy grail of LLMs-reliably-producing-innovations has finally been found.
But in actuality, it’s “LLM straightforwardly optimizes/improves a codebase that transforms compute into improvements to the SOTA”. We already knew LLMs can do that sometimes.
The issue isn’t that the improvements are minor. It’s that the AIs’ ability to make those improvements in this setting has ~nothing to do with AIs’ ability to output innovations in other settings. It’s not really a conceptual-research task.
Granted: The steelman here is that this setting is also the setting of DL research, so this can potentially lead to RSI… But now we run into a Catch-22. If LLMs are in fact not capable of reliably finding nontrivial open-domain discoveries, if they could only do so in those limited settings, then no realistic amount of “recursive self-improvement” of LLMs would result in an actual Singularity.