I’m starting to feel skeptical how reasonable/well-defined these capability levels are in the modern paradigm.
My understanding is that reasoning models’ training includes a lot of clever use of other AIs to generate data or to evaluate completions. Could AI companies create similarly capable models from the same budget as their newest reasoning models if their employees’ brain run at 2x speed, but they couldn’t use earlier AIs for data generation or evaluation?
I’m really not sure. I think plausibly the current reasoning training paradigm just wouldn’t work at all without using AIs in training. So AI companies would need to look for a different paradigm, which might work much less well, which I can easily imagine outweighing the advantage of employees running 2x speed. If that’s the case, does that mean that GPT-4.1 or whatever AI they used in the training of the first reasoning model was plausibly already more than 2x-ing AI R&D labor according to this post’s definition? I think that really doesn’t match the intuition that this post tried to convey, so I think probably the definition should be changed, but I don’t know what would be a good definition.
Ah, an important clarification is that when I refer to AI R&D labor acceleration I mean to refer to AIs accelerating work that employees might do or would typically do.
Note that “using other AIs to generate data or evaluate completions” includes literally running RL on AIs. So I certainly don’t mean to include all versions of this, but would include versions of this that employees would otherwise do (or employees would typically do).
This makes the definition less precise I’m afraid.
One alterative definition that is more straightforward in the short term (but is less meaningful longer term) is just talking about accelerating research engineering work in AI R&D companies to explicitly make the scope more narrow and well understood (while the scope of AI R&D is less clear). E.g., the application of AIs to automate/accelerate research engineering work is as useful as making research engineers X times more productive. This could in principle be measured with something like uplift trials. See discussion in this post for some thoughts on the relationship between engineering acceleration and AI progress acceleration / AI R&D acceleration.
Thanks. I think the possible failure mode of this definition is now in the opposite direction: it’s possible there will be an AI that provides less than 2x acceleration according to this new definition (it’s not super good at the type of tasks humans typically do), but it’s so good at mass-producing new RL environments or something else, and that mass-production turns out to so useful, that the existence of this model already kicks off a rapid intelligence explosion. I agree this is not too likely in the short term though, so the new imprecise definition is probably kind of reasonable for now.
You could do something like “2x labor acceleration of labor humans would otherwise/typically do” OR “AI progress is overall >1.5x faster (due to AI usage, or maybe just for whatever reason)”.
I’m starting to feel skeptical how reasonable/well-defined these capability levels are in the modern paradigm.
My understanding is that reasoning models’ training includes a lot of clever use of other AIs to generate data or to evaluate completions. Could AI companies create similarly capable models from the same budget as their newest reasoning models if their employees’ brain run at 2x speed, but they couldn’t use earlier AIs for data generation or evaluation?
I’m really not sure. I think plausibly the current reasoning training paradigm just wouldn’t work at all without using AIs in training. So AI companies would need to look for a different paradigm, which might work much less well, which I can easily imagine outweighing the advantage of employees running 2x speed. If that’s the case, does that mean that GPT-4.1 or whatever AI they used in the training of the first reasoning model was plausibly already more than 2x-ing AI R&D labor according to this post’s definition? I think that really doesn’t match the intuition that this post tried to convey, so I think probably the definition should be changed, but I don’t know what would be a good definition.
Ah, an important clarification is that when I refer to AI R&D labor acceleration I mean to refer to AIs accelerating work that employees might do or would typically do.
Note that “using other AIs to generate data or evaluate completions” includes literally running RL on AIs. So I certainly don’t mean to include all versions of this, but would include versions of this that employees would otherwise do (or employees would typically do).
This makes the definition less precise I’m afraid.
One alterative definition that is more straightforward in the short term (but is less meaningful longer term) is just talking about accelerating research engineering work in AI R&D companies to explicitly make the scope more narrow and well understood (while the scope of AI R&D is less clear). E.g., the application of AIs to automate/accelerate research engineering work is as useful as making research engineers X times more productive. This could in principle be measured with something like uplift trials. See discussion in this post for some thoughts on the relationship between engineering acceleration and AI progress acceleration / AI R&D acceleration.
Thanks. I think the possible failure mode of this definition is now in the opposite direction: it’s possible there will be an AI that provides less than 2x acceleration according to this new definition (it’s not super good at the type of tasks humans typically do), but it’s so good at mass-producing new RL environments or something else, and that mass-production turns out to so useful, that the existence of this model already kicks off a rapid intelligence explosion. I agree this is not too likely in the short term though, so the new imprecise definition is probably kind of reasonable for now.
You could do something like “2x labor acceleration of labor humans would otherwise/typically do” OR “AI progress is overall >1.5x faster (due to AI usage, or maybe just for whatever reason)”.