I agree that a large fraction, and probably most, of the progress here will be driven by increases in qualitative intelligence, rather than running more parallel copies of having copies think faster. Especially because we will be relatively “rich” in parallel copies during this period, so may hitting sharper DMR to doubling the parallel copies even further than we have hit today.
I do try to estimate the size of the effect of having qualitatively smarter researchers and incorporate it into the model.
Basically, I look at data for how a marginal increase in qualitative capability increases productivity today, and translate that into an equivalent marginal increase in parallel copies. So while it might be impossible for any number of average-capability ppl to make as much progress as one super-genius, we can ask how many average-capability ppl would make similarly fast progress to one slightly-above-average-capability person. So maybe we find that one standard deviation increase in productivity within the human distribution is equivalent to increasing the number of parallel researchers by 3X, which we think speeds up the pace of progress by 1.5X.
Then I chain that forward through to higher capability levels. I.e. I assume that every standard deviation increase will continue to speed up the pace of progress by 1.5X. Now eventually we’ll be considering the super-genius, and we think perhaps that he can speed up progress by 10X, and we’ll think that (in reality) no number of parallel average-capability ppl could have done a similarly-sized speed-up. But that’s all ok as long as our two core assumptions (the first standard deviation speeds up progress by 1.5X, and other standard deviations are the same) hold.
(In the model itself, we translate all this into parallel equivalents. But the model also assumes that the DMR to adding more parallel copies stays constant even as the number of “parallel copies” becomes absurdly high. In this example, it assumes that each time you 3X the number of “parallel copies”, the pace of progress increases by 1.5X, no matter how many parallel copies you already have. Now, this assumption is unrealistic in the extreme when we’re considering literal parallel copies. But it’s actually an appropriate assumption to make when most of our “parallel copies” are actually coming from more qualitative capability and we expect the effects of increasing qualitative capability to stay constant. So I think the model makes two unrealistic assumptions that (hopefully!) roughly cancel out in expectation. First, it translates qualitative capability gains to parallel copies. Second, it assumes that adding more parallel copies continues to be helpful without limit. But that’s ok, as long as qualitative capability gains continue to be helpful without limit.)
I just calculated what my median parameters imply about this the productivity gains to marginal more qualitative capability. (FYI, I link to my data sources for calculating the marginal impact of more capable models—search for “upwards for improving capabilities”.) I’m implicitly assuming that a 10X increase in effective training compute increases the pace of researcher progress by 4X. (See sheet.)
In the same sheet, I also just did a new calculation of the same quantity by using 1) AI-2027′s survey of lab researchers to estimate the productivity gain from replacing median researchers with experts, and 2) Ryan Greenblatt’s estimate of how much effective compute is needed to increase capability from a worse to better human expert. This produced only 2.2X—and it might be higher if all the gains were used to increase qualiatively capability (Ryan imagines some being used for more copies and faster thinking) (Edited to add: I initially though this was only 1.5X, which was out of whack with my assumptions). That’s a fairly big discrepancy, suggesting I may have given too much weight to qualitative capability increases. Though combined with my other assumptions this calc implies that, on the current margin, doubling effective training compute and running the same number of copies would be worse than running 2X as many parallel copies, which seems wrong (see cell D9).
Anyway, that’s all to say that your question is a great one, and I’ve tried to incorporate the key consideration you’re pointing to, but as you predicted it’s hard to model well and so this is a big source of uncertainty in the model. It enters into my estimate of r.
when you build a new AI that is now the most intelligent being in the world, it starts doing research and finds many ideas that are easy for it and near impossible for all the beings in the world that came before it.
This is an interesting argument for thinking that the returns to intelligence as you move above the human range will be bigger than the returns within that range. In principle you could measure that by considering the humans who are literally the very best. Or by seeing how the returns to intelligence change as you approach that person. As you hone in on the most intelligent person, they should increasingly benefit from this “there’s no one else to pluck the low-hanging fruit” effect.
And my guess is that doing this would tend to increase the estimated effect here. E.g. AI-2027′s survey compared the median to the top expert. But maybe if it had compared the 10th best to the very best, the predicted speed-up would have been similar, and so i’d have calculated a more intense speed-up per OOM of extra effective compute. (More than the 2.2X per OOM that I in fact calculated.)
From a skim, seems you should be using the 6.25x value rather than the 2.5x in B2 of your sheet. If I’m skimming it correctly, 6.25x is the estimate for replacing a hypothetical all median lab with a hypothetical all top researcher lab. This is what occurs when you improve your ASARA model. Whereas, 2.5x is the estimate for replacing the actual lab with an all top lab.
This still gives a lower than 4x value, but I think if you plug in reasonable log-normals 4x will be within your 90% CI, and so it seems fine.
And i realise Ryan’s seemingly assuming we only use some of the gains for better qualitative capabilities. So that would further reduce the discrepancy.
Great question—thanks!
I agree that a large fraction, and probably most, of the progress here will be driven by increases in qualitative intelligence, rather than running more parallel copies of having copies think faster. Especially because we will be relatively “rich” in parallel copies during this period, so may hitting sharper DMR to doubling the parallel copies even further than we have hit today.
I do try to estimate the size of the effect of having qualitatively smarter researchers and incorporate it into the model.
Basically, I look at data for how a marginal increase in qualitative capability increases productivity today, and translate that into an equivalent marginal increase in parallel copies. So while it might be impossible for any number of average-capability ppl to make as much progress as one super-genius, we can ask how many average-capability ppl would make similarly fast progress to one slightly-above-average-capability person. So maybe we find that one standard deviation increase in productivity within the human distribution is equivalent to increasing the number of parallel researchers by 3X, which we think speeds up the pace of progress by 1.5X.
Then I chain that forward through to higher capability levels. I.e. I assume that every standard deviation increase will continue to speed up the pace of progress by 1.5X. Now eventually we’ll be considering the super-genius, and we think perhaps that he can speed up progress by 10X, and we’ll think that (in reality) no number of parallel average-capability ppl could have done a similarly-sized speed-up. But that’s all ok as long as our two core assumptions (the first standard deviation speeds up progress by 1.5X, and other standard deviations are the same) hold.
(In the model itself, we translate all this into parallel equivalents. But the model also assumes that the DMR to adding more parallel copies stays constant even as the number of “parallel copies” becomes absurdly high. In this example, it assumes that each time you 3X the number of “parallel copies”, the pace of progress increases by 1.5X, no matter how many parallel copies you already have. Now, this assumption is unrealistic in the extreme when we’re considering literal parallel copies. But it’s actually an appropriate assumption to make when most of our “parallel copies” are actually coming from more qualitative capability and we expect the effects of increasing qualitative capability to stay constant. So I think the model makes two unrealistic assumptions that (hopefully!) roughly cancel out in expectation. First, it translates qualitative capability gains to parallel copies. Second, it assumes that adding more parallel copies continues to be helpful without limit. But that’s ok, as long as qualitative capability gains continue to be helpful without limit.)
I just calculated what my median parameters imply about this the productivity gains to marginal more qualitative capability. (FYI, I link to my data sources for calculating the marginal impact of more capable models—search for “upwards for improving capabilities”.) I’m implicitly assuming that a 10X increase in effective training compute increases the pace of researcher progress by 4X. (See sheet.)
In the same sheet, I also just did a new calculation of the same quantity by using 1) AI-2027′s survey of lab researchers to estimate the productivity gain from replacing median researchers with experts, and 2) Ryan Greenblatt’s estimate of how much effective compute is needed to increase capability from a worse to better human expert. This produced only 2.2X—and it might be higher if all the gains were used to increase qualiatively capability (Ryan imagines some being used for more copies and faster thinking) (Edited to add: I initially though this was only 1.5X, which was out of whack with my assumptions).
That’s a fairly big discrepancy, suggesting I may have giventoo muchweight to qualitative capability increases. Though combined with my other assumptions this calc implies that, on the current margin, doubling effective training compute and running the same number of copies would beworsethan running 2X as many parallel copies, which seems wrong (seecell D9).Anyway, that’s all to say that your question is a great one, and I’ve tried to incorporate the key consideration you’re pointing to, but as you predicted it’s hard to model well and so this is a big source of uncertainty in the model. It enters into my estimate of r.
This is an interesting argument for thinking that the returns to intelligence as you move above the human range will be bigger than the returns within that range. In principle you could measure that by considering the humans who are literally the very best. Or by seeing how the returns to intelligence change as you approach that person. As you hone in on the most intelligent person, they should increasingly benefit from this “there’s no one else to pluck the low-hanging fruit” effect.
And my guess is that doing this would tend to increase the estimated effect here. E.g. AI-2027′s survey compared the median to the top expert. But maybe if it had compared the 10th best to the very best, the predicted speed-up would have been similar, and so i’d have calculated a more intense speed-up per OOM of extra effective compute. (More than the 2.2X per OOM that I in fact calculated.)
From a skim, seems you should be using the 6.25x value rather than the 2.5x in B2 of your sheet. If I’m skimming it correctly, 6.25x is the estimate for replacing a hypothetical all median lab with a hypothetical all top researcher lab. This is what occurs when you improve your ASARA model. Whereas, 2.5x is the estimate for replacing the actual lab with an all top lab.
This still gives a lower than 4x value, but I think if you plug in reasonable log-normals 4x will be within your 90% CI, and so it seems fine.
Thanks, great catch. Corrected this.
And i realise Ryan’s seemingly assuming we only use some of the gains for better qualitative capabilities. So that would further reduce the discrepancy.