Person comments on Cole Wyeth’s Shortform

Person 14 May 2025 23:38 UTC
1 point
0
Heads up: I am not an AI researcher or even an academic, just someone who keeps up with AI
But I do have quick thoughts as well;
Kernel optimization (which they claim is what resulted in the 1% decrease in training time) is something we know AI models are great at (see RE-Bench and the multiple arXiv papers on the matter, including from DeepSeek).
It seems to me like AlphaEvolve is more-or-less an improvement over previous models that also claimed to make novel algorithmic and mathematical discoveries (FunSearch, AlphaTensor) notably by using better base Gemini models and a better agentic framework. We also know that AI models already contribute to the improvement of AI hardware. What AlphaEvolve seems to do is to unify all of that into a superhuman model for those multiple uses. In the accompanying podcast they give us some further information:
- The rate of improvement is still moderate, and the process still takes months. They phrase it as an interesting and promising area of progress for the future, not as a current large improvement.
- They have not tried to distill all that data into a new model yet, which seems strange to me considering they’ve had it for a year now.
- They say that a lot of improvements come from the base model’s quality.
- They do present the whole thing as part of research rather than a product
So yeah I can definitely see a path for large gains in the future, thought for now those are still on similar timetables as per their own admission. They expect further improvements when base models improve and are hoping that future versions of AlphaEvolve can in turn shorten the training time for models, the hardware pipeline, and improve models in other ways. And for your point about novel discoveries, previous Alpha models seemed to already be able to do the same categories of research back in 2023, on mathematics and algorithmic optimization. We need more knowledgeable people to weight in, especially to compare with previous models of the same classification.
This is also a very small thing to keep in mind, but GDM models don’t often share the actual results of their models’ work as usable/replicable papers, which has caused experts to cast some doubts on results in the past. It’s hard to verify their results, since they’ll be keeping them close to their chests.