I’m very skeptical of AI being on the brink of dramatically accelerating AI R&D.
My current model is that ML experiments are bottlenecked not on software-engineer hours, but on compute. See Ilya Sutskever’s claim here:
95% of progress comes from the ability to run big experiments quickly. The utility of running many experiments is much less useful.
What actually matters for ML-style progress is picking the correct trick, and then applying it to a big-enough model. If you pick the trick wrong, you ruin the training run, which (a) potentially costs millions of dollars, (b) wastes the ocean of FLOP you could’ve used for something else.
And picking the correct trick is primarily a matter of research taste, because:
Tricks that work on smaller scales often don’t generalize to larger scales.
Tricks that work on larger scales often don’t work on smaller scales (due to bigger ML models having various novel emergent properties).
Simultaneously integrating several disjunctive incremental improvements into one SotA training run is likely nontrivial/impossible in the general case.[1]
So 10x’ing the number of small-scale experiments is unlikely to actually 10x ML research, along any promising research direction.
And, on top of that, I expect that AGI labs don’t actually have the spare compute to do that 10x’ing. I expect it’s all already occupied 24⁄7 running all manners of smaller-scale experiments, squeezing whatever value out of them that can be squeezed out. (See e. g. Superalignment team’s struggle to get access to compute: that suggests there isn’t an internal compute overhang.)
Indeed, an additional disadvantage of AI-based researchers/engineers is that their forward passes would cut into that limited compute budget. Offloading the computations associated with software engineering and experiment oversight onto the brains of mid-level human engineers is potentially more cost-efficient.
As a separate line of argumentation: Suppose that, as you describe it in another comment, we imagine that AI would soon be able to give senior researchers teams of 10x-speed 24/7-working junior devs, to whom they’d be able to delegate setting up and managing experiments. Is there a reason to think that any need for that couldn’t already be satisfied?
If it were an actual bottleneck, I would expect it to have already been solved: by the AGI labs just hiring tons of competent-ish software engineers. They have vast amounts of money now, and LLM-based coding tools seem competent enough to significantly speed up a human programmer’s work on formulaic tasks. So any sufficiently simple software-engineering task should already be done at lightning speeds within AGI labs.
In addition: the academic-research and open-source communities exist, and plausibly also fill the niche of “a vast body of competent-ish junior researchers trying out diverse experiments”. The task of keeping senior researchers up-to-date on openly published insights should likewise already be possible to dramatically speed up by tasking LLMs with summarizing them, or by hiring intermediary ML researchers to do that.
So I expect the market for mid-level software engineers/ML researchers to be saturated.
So, summing up:
10x’ing the ability to run small-scale experiments seems low-value, because:
The performance of a trick at a small scale says little (one way or another) about its performance on a bigger scale.
Integrating a scalable trick into the SotA-model tech stack is highly nontrivial.
Most of the value and insight comes from full-scale experiments, which are bottlenecked on compute and senior-researcher taste.
AI likely can’t even 10x small-scale experimentation, because that’s also already bottlenecked on compute, not on mid-level engineer-hours. There’s no “compute overhang”; all available compute is already in use 24⁄7.
If it weren’t the case, there’s nothing stopping AGI labs from hiring mid-level engineers until they are no longer bottlenecked on their time; or tapping academic research/open-source results.
AI-based engineers would plausibly be less efficient than human engineers, because their inference calls would cut into the compute that could instead be spent on experiments.
If so, then AI R&D is bottlenecked on research taste, system-design taste, and compute, and there’s relatively little non-AGI-level models can contribute to it. Maybe a 2x speed-up, at most, somehow; not a 10x’ing.
To 10x the compute, you might need to 10x the funding, which AI capable of automating AI research can secure in other ways. Smaller-than-frontier experiments don’t need unusually giant datacenters (which can be challenging to build quickly), they only need a lot of regular datacenters and the funding to buy their time. Currently there are millions of H100 chips out there in the world, so 100K H100 chips in a giant datacenter is not the relevant anchor for the scale of smaller experiments, the constraint is funding.
Thanks for amplifying. I disagree with Thane on some things they said in that comment, and I don’t want to get into the details publicly, but I will say:
it’s worth looking at DeepSeek V3 and what they did with a $5.6 million training run (obviously that is still a nontrivial amount / CEO actively says most of the cost of their training runs is coming from research talent),
compute is still a bottleneck (and why I’m looking to build an ai safety org to efficiently absorb funding/compute for this), but I think Thane is not acknowledging that some types of research require much more compute than others (tho I agree research taste matters, which is also why DeepSeek’s CEO hires for cracked researchers, but don’t think it’s an insurmountable wall),
“Simultaneously integrating several disjunctive incremental improvements into one SotA training run is likely nontrivial/impossible in the general case.” Yes, seems really hard and a bottleneck...for humans and current AIs.
imo, AI models will become Omega Cracked at infra and hyper-optimizing training/inference to keep costs down soon enough (which seems to be what DeepSeek is especially insanely good at)
Thanks for amplifying. I disagree with Thane on some things they said in that comment, and I don’t want to get into the details publicly, but I will say:
Is this because it would reveal private/trade-secret information, or is this for another reason?
I mean that it’s a trade secret for what I’m personally building, and I would also rather people don’t just use it freely for advancing frontier capabilities research.
People are not thinking clearly about AI-accelerated AI research. This comment by Thane Ruthenis is worth amplifying.
That claim is from 2017. Does Ilya even still endorse it?
To 10x the compute, you might need to 10x the funding, which AI capable of automating AI research can secure in other ways. Smaller-than-frontier experiments don’t need unusually giant datacenters (which can be challenging to build quickly), they only need a lot of regular datacenters and the funding to buy their time. Currently there are millions of H100 chips out there in the world, so 100K H100 chips in a giant datacenter is not the relevant anchor for the scale of smaller experiments, the constraint is funding.
Thanks for amplifying. I disagree with Thane on some things they said in that comment, and I don’t want to get into the details publicly, but I will say:
it’s worth looking at DeepSeek V3 and what they did with a $5.6 million training run (obviously that is still a nontrivial amount / CEO actively says most of the cost of their training runs is coming from research talent),
compute is still a bottleneck (and why I’m looking to build an ai safety org to efficiently absorb funding/compute for this), but I think Thane is not acknowledging that some types of research require much more compute than others (tho I agree research taste matters, which is also why DeepSeek’s CEO hires for cracked researchers, but don’t think it’s an insurmountable wall),
“Simultaneously integrating several disjunctive incremental improvements into one SotA training run is likely nontrivial/impossible in the general case.” Yes, seems really hard and a bottleneck...for humans and current AIs.
imo, AI models will become Omega Cracked at infra and hyper-optimizing training/inference to keep costs down soon enough (which seems to be what DeepSeek is especially insanely good at)
Is this because it would reveal private/trade-secret information, or is this for another reason?
Yes (all of the above)
If you knew it was legal to disseminate the information, and trade-secret/copyright/patent law didn’t apply, would you still not release it?
I mean that it’s a trade secret for what I’m personally building, and I would also rather people don’t just use it freely for advancing frontier capabilities research.
See my response here.