This has been going on for months; on the bullish side (for ai progress, not human survival) this means some form of self-improvement is well behind the capability frontier. On the bearish side, we may not expect a further speed up on the log scale (since it’s already factored in to some calculations).
I did not expect this degree of progress so soon; I am now much less certain about the limits of LLMs and less prepared to dismiss very short timelines.
With that said… the problems that it has solved do seem to be somewhat exhaustive search flavored. For instance it apparently solved an open math problem, but this involved arranging a bunch of spheres. I’m not sure to what degree LLM insight was required beyond just throwing a massive amount of compute at trying possibilities. The self-improvements GDM reports are similar—like faster matrix multiplication in I think the 4x4 case. I do not know enough about these areas to judge whether AI is essential here or whether a vigorous proof search would work. At the very least, the system does seem to specialize in problems with highly verifiable solutions. I am convinced, but not completely convinced.
Also, for the last couple of months whenever I’ve asked why LLMs haven’t produced novel insights, I’ve often gotten the response “no one is just letting them run long enough to try.” Apparently GDM did try it (as I expected) and it seems to have worked somewhat well (as I did not expect).
Heads up: I am not an AI researcher or even an academic, just someone who keeps up with AI
But I do have quick thoughts as well;
Kernel optimization (which they claim is what resulted in the 1% decrease in training time) is something we know AI models are great at (see RE-Bench and the multiple arXiv papers on the matter, including from DeepSeek).
It seems to me like AlphaEvolve is more-or-less an improvement over previous models that also claimed to make novel algorithmic and mathematical discoveries (FunSearch, AlphaTensor) notably by using better base Gemini models and a better agentic framework. We also know that AI models already contribute to the improvement of AI hardware. What AlphaEvolve seems to do is to unify all of that into a superhuman model for those multiple uses. In the accompanying podcast they give us some further information:
The rate of improvement is still moderate, and the process still takes months. They phrase it as an interesting and promising area of progress for the future, not as a current large improvement.
They have not tried to distill all that data into a new model yet, which seems strange to me considering they’ve had it for a year now.
They say that a lot of improvements come from the base model’s quality.
They do present the whole thing as part of research rather than a product
So yeah I can definitely see a path for large gains in the future, thought for now those are still on similar timetables as per their own admission. They expect further improvements when base models improve and are hoping that future versions of AlphaEvolve can in turn shorten the training time for models, the hardware pipeline, and improve models in other ways. And for your point about novel discoveries, previous Alpha models seemed to already be able to do the same categories of research back in 2023, on mathematics and algorithmic optimization. We need more knowledgeable people to weight in, especially to compare with previous models of the same classification.
This is also a very small thing to keep in mind, but GDM models don’t often share the actual results of their models’ work as usable/replicable papers, which has caused experts to cast some doubts on results in the past. It’s hard to verify their results, since they’ll be keeping them close to their chests.
It looks like Gemini is self-improving in a meaningful sense:
https://deepmind.google/discover/blog/alphaevolve-a-gemini-powered-coding-agent-for-designing-advanced-algorithms/
Some quick thoughts:
This has been going on for months; on the bullish side (for ai progress, not human survival) this means some form of self-improvement is well behind the capability frontier. On the bearish side, we may not expect a further speed up on the log scale (since it’s already factored in to some calculations).
I did not expect this degree of progress so soon; I am now much less certain about the limits of LLMs and less prepared to dismiss very short timelines.
With that said… the problems that it has solved do seem to be somewhat exhaustive search flavored. For instance it apparently solved an open math problem, but this involved arranging a bunch of spheres. I’m not sure to what degree LLM insight was required beyond just throwing a massive amount of compute at trying possibilities. The self-improvements GDM reports are similar—like faster matrix multiplication in I think the 4x4 case. I do not know enough about these areas to judge whether AI is essential here or whether a vigorous proof search would work. At the very least, the system does seem to specialize in problems with highly verifiable solutions. I am convinced, but not completely convinced.
Also, for the last couple of months whenever I’ve asked why LLMs haven’t produced novel insights, I’ve often gotten the response “no one is just letting them run long enough to try.” Apparently GDM did try it (as I expected) and it seems to have worked somewhat well (as I did not expect).
Heads up: I am not an AI researcher or even an academic, just someone who keeps up with AI
But I do have quick thoughts as well;
Kernel optimization (which they claim is what resulted in the 1% decrease in training time) is something we know AI models are great at (see RE-Bench and the multiple arXiv papers on the matter, including from DeepSeek).
It seems to me like AlphaEvolve is more-or-less an improvement over previous models that also claimed to make novel algorithmic and mathematical discoveries (FunSearch, AlphaTensor) notably by using better base Gemini models and a better agentic framework. We also know that AI models already contribute to the improvement of AI hardware. What AlphaEvolve seems to do is to unify all of that into a superhuman model for those multiple uses. In the accompanying podcast they give us some further information:
The rate of improvement is still moderate, and the process still takes months. They phrase it as an interesting and promising area of progress for the future, not as a current large improvement.
They have not tried to distill all that data into a new model yet, which seems strange to me considering they’ve had it for a year now.
They say that a lot of improvements come from the base model’s quality.
They do present the whole thing as part of research rather than a product
So yeah I can definitely see a path for large gains in the future, thought for now those are still on similar timetables as per their own admission. They expect further improvements when base models improve and are hoping that future versions of AlphaEvolve can in turn shorten the training time for models, the hardware pipeline, and improve models in other ways. And for your point about novel discoveries, previous Alpha models seemed to already be able to do the same categories of research back in 2023, on mathematics and algorithmic optimization. We need more knowledgeable people to weight in, especially to compare with previous models of the same classification.
This is also a very small thing to keep in mind, but GDM models don’t often share the actual results of their models’ work as usable/replicable papers, which has caused experts to cast some doubts on results in the past. It’s hard to verify their results, since they’ll be keeping them close to their chests.