I would presume that the process of the AI improvement can be also modelled as: A.) Coming up with good research ideas. B.) Finding the precise formulation of that idea that makes most sense/works. C.) Implementation of the idea.
If you claim that C) only “takes hours”—then with the AI Coder it takes seconds instead (nowadays agents work correctly only 50-70% of the time, hence a programmer indeed has to spent these couple of hours).
Then the loop becomes tighter—a single iteration takes a few hours less.
Let’s assume there’s a very creative engineer who can come up with a couple ideas a day. What is the B-step? Finding the formulation means e.g. getting the math equations, right? The LLMs become superhuman at math this year already. If they’re superhuman then the loop becomes tighter.
Then instead of spending a day on an idea (a few hours of implementation), you test a bunch of them a day.
Also—the A) can probably get automated too, with a framework in which you make the model read all the literature and provide combinations of ideas which you then filter out. Each new model makes the propositions more relevant.
So all 3 steps get semi-automated (and gradually tighten with next models releases), where the human’s role boils down to filtering things out—it’s the “taste” quality, which Kokotajlo mentions.
Let’s instead assume a top engineer has a really consequential idea every couple of months. Now what?
Speeding up implementation just means that you test more of the less promising ideas.
Speeding up feedback might mean that you can hone in on the really good ideas faster, but does this actually happen if you don’t do the coding and don’t do the math?
I would presume that the process of the AI improvement can be also modelled as:
A.) Coming up with good research ideas.
B.) Finding the precise formulation of that idea that makes most sense/works.
C.) Implementation of the idea.
If you claim that C) only “takes hours”—then with the AI Coder it takes seconds instead (nowadays agents work correctly only 50-70% of the time, hence a programmer indeed has to spent these couple of hours).
Then the loop becomes tighter—a single iteration takes a few hours less.
Let’s assume there’s a very creative engineer who can come up with a couple ideas a day.
What is the B-step? Finding the formulation means e.g. getting the math equations, right? The LLMs become superhuman at math this year already. If they’re superhuman then the loop becomes tighter.
Then instead of spending a day on an idea (a few hours of implementation), you test a bunch of them a day.
Also—the A) can probably get automated too, with a framework in which you make the model read all the literature and provide combinations of ideas which you then filter out. Each new model makes the propositions more relevant.
So all 3 steps get semi-automated (and gradually tighten with next models releases), where the human’s role boils down to filtering things out—it’s the “taste” quality, which Kokotajlo mentions.
Let’s instead assume a top engineer has a really consequential idea every couple of months. Now what?
Speeding up implementation just means that you test more of the less promising ideas.
Speeding up feedback might mean that you can hone in on the really good ideas faster, but does this actually happen if you don’t do the coding and don’t do the math?