Caveat: I haven’t read this very closely yet, and I’m not an economist. I’m finding it hard to understand why you think it’s reasonable to model an increase in capabilities by an increase in number of parallel copies. That is: in the returns to R&D section, you look at data on how increasing numbers of human-level researchers in AI affect algorithmic progress, but we have ~no data on what happens when you sample researchers from a very different (and superhuman) capability profile. It seems to me entirely plausible that a few months into the intelligence explosion, the best AI researchers are qualitatively superintelligent enough that their research advances per month aren’t the sort of thing that could be done by ~any number of humans[1] acting in parallel in a month. I acknowledge that this is probably not tractable to model, but that seems like a problem because it seems to me that this qualitative superintelligence is a (maybe the) key driving force of the intelligence explosion.
Some intuition pumps for why this seems reasonably likely:
My understanding is that historians of science disagree on whether science is driven mostly by a few geniuses or not. It probably varies by discipline, and by how understanding-driven progress is. Compared to other fields in hard STEM, ML is currently probably less understanding-driven right now, but it is still relatively understanding-driven. I think there are good reasons to think that it could plausibly transition to being more understanding driven when the researchers become superhuman, because interp, agent foundations, GOFAI etc haven’t made zero progress and don’t seem fundamentally impossible to me. And if capabilities research becomes loaded on understanding very complicated things, then it could become extremely dependent on quite how capable the most capable researchers are in a way that can’t easily be substituted for by more human-level researchers.
Suppose I take a smart human and give them the ability/bandwidth to memorise and understand the entire internet. That person would be really different to any normal human, and also really different to any group of humans. So when they try to do research, they approach the tree of ideas to pick the low hanging fruit from a different direction to all of society’s research efforts beforehand, so it seems possible that from their perspective there is a lot of low hanging fruit left on the tree — lots of things that seem easy from their vantage point and nearly impossible to grasp from our perspective[2]. And, research into how much diminishing returns we’ve seen to ideas in the field is not useful for predicting how much research progress that enhanced human would make in their first year.
It seems hard to know quite how many angles of approach there are on the tree of ideas, but it seems possible to me that on more than one occasion when you build a new AI that is now the most intelligent being in the world, it starts doing research and finds many ideas that are easy for it and near impossible for all the beings in the world that came before it.
This is basically the same idea as Dwarkesh’s point that a human-level LLM should be able to make all sorts of new discoveries by connecting dots that humans can’t connect because we can’t read and take in the whole internet.
I agree that a large fraction, and probably most, of the progress here will be driven by increases in qualitative intelligence, rather than running more parallel copies of having copies think faster. Especially because we will be relatively “rich” in parallel copies during this period, so may hitting sharper DMR to doubling the parallel copies even further than we have hit today.
I do try to estimate the size of the effect of having qualitatively smarter researchers and incorporate it into the model.
Basically, I look at data for how a marginal increase in qualitative capability increases productivity today, and translate that into an equivalent marginal increase in parallel copies. So while it might be impossible for any number of average-capability ppl to make as much progress as one super-genius, we can ask how many average-capability ppl would make similarly fast progress to one slightly-above-average-capability person. So maybe we find that one standard deviation increase in productivity within the human distribution is equivalent to increasing the number of parallel researchers by 3X, which we think speeds up the pace of progress by 1.5X.
Then I chain that forward through to higher capability levels. I.e. I assume that every standard deviation increase will continue to speed up the pace of progress by 1.5X. Now eventually we’ll be considering the super-genius, and we think perhaps that he can speed up progress by 10X, and we’ll think that (in reality) no number of parallel average-capability ppl could have done a similarly-sized speed-up. But that’s all ok as long as our two core assumptions (the first standard deviation speeds up progress by 1.5X, and other standard deviations are the same) hold.
(In the model itself, we translate all this into parallel equivalents. But the model also assumes that the DMR to adding more parallel copies stays constant even as the number of “parallel copies” becomes absurdly high. In this example, it assumes that each time you 3X the number of “parallel copies”, the pace of progress increases by 1.5X, no matter how many parallel copies you already have. Now, this assumption is unrealistic in the extreme when we’re considering literal parallel copies. But it’s actually an appropriate assumption to make when most of our “parallel copies” are actually coming from more qualitative capability and we expect the effects of increasing qualitative capability to stay constant. So I think the model makes two unrealistic assumptions that (hopefully!) roughly cancel out in expectation. First, it translates qualitative capability gains to parallel copies. Second, it assumes that adding more parallel copies continues to be helpful without limit. But that’s ok, as long as qualitative capability gains continue to be helpful without limit.)
I just calculated what my median parameters imply about this the productivity gains to marginal more qualitative capability. (FYI, I link to my data sources for calculating the marginal impact of more capable models—search for “upwards for improving capabilities”.) I’m implicitly assuming that a 10X increase in effective training compute increases the pace of researcher progress by 4X. (See sheet.)
In the same sheet, I also just did a new calculation of the same quantity by using 1) AI-2027′s survey of lab researchers to estimate the productivity gain from replacing median researchers with experts, and 2) Ryan Greenblatt’s estimate of how much effective compute is needed to increase capability from a worse to better human expert. This produced only 2.2X—and it might be higher if all the gains were used to increase qualiatively capability (Ryan imagines some being used for more copies and faster thinking) (Edited to add: I initially though this was only 1.5X, which was out of whack with my assumptions). That’s a fairly big discrepancy, suggesting I may have given too much weight to qualitative capability increases. Though combined with my other assumptions this calc implies that, on the current margin, doubling effective training compute and running the same number of copies would be worse than running 2X as many parallel copies, which seems wrong (see cell D9).
Anyway, that’s all to say that your question is a great one, and I’ve tried to incorporate the key consideration you’re pointing to, but as you predicted it’s hard to model well and so this is a big source of uncertainty in the model. It enters into my estimate of r.
when you build a new AI that is now the most intelligent being in the world, it starts doing research and finds many ideas that are easy for it and near impossible for all the beings in the world that came before it.
This is an interesting argument for thinking that the returns to intelligence as you move above the human range will be bigger than the returns within that range. In principle you could measure that by considering the humans who are literally the very best. Or by seeing how the returns to intelligence change as you approach that person. As you hone in on the most intelligent person, they should increasingly benefit from this “there’s no one else to pluck the low-hanging fruit” effect.
And my guess is that doing this would tend to increase the estimated effect here. E.g. AI-2027′s survey compared the median to the top expert. But maybe if it had compared the 10th best to the very best, the predicted speed-up would have been similar, and so i’d have calculated a more intense speed-up per OOM of extra effective compute. (More than the 2.2X per OOM that I in fact calculated.)
From a skim, seems you should be using the 6.25x value rather than the 2.5x in B2 of your sheet. If I’m skimming it correctly, 6.25x is the estimate for replacing a hypothetical all median lab with a hypothetical all top researcher lab. This is what occurs when you improve your ASARA model. Whereas, 2.5x is the estimate for replacing the actual lab with an all top lab.
This still gives a lower than 4x value, but I think if you plug in reasonable log-normals 4x will be within your 90% CI, and so it seems fine.
And i realise Ryan’s seemingly assuming we only use some of the gains for better qualitative capabilities. So that would further reduce the discrepancy.
Thanks for this post!
Caveat: I haven’t read this very closely yet, and I’m not an economist. I’m finding it hard to understand why you think it’s reasonable to model an increase in capabilities by an increase in number of parallel copies. That is: in the returns to R&D section, you look at data on how increasing numbers of human-level researchers in AI affect algorithmic progress, but we have ~no data on what happens when you sample researchers from a very different (and superhuman) capability profile. It seems to me entirely plausible that a few months into the intelligence explosion, the best AI researchers are qualitatively superintelligent enough that their research advances per month aren’t the sort of thing that could be done by ~any number of humans[1] acting in parallel in a month. I acknowledge that this is probably not tractable to model, but that seems like a problem because it seems to me that this qualitative superintelligence is a (maybe the) key driving force of the intelligence explosion.
Some intuition pumps for why this seems reasonably likely:
My understanding is that historians of science disagree on whether science is driven mostly by a few geniuses or not. It probably varies by discipline, and by how understanding-driven progress is. Compared to other fields in hard STEM, ML is currently probably less understanding-driven right now, but it is still relatively understanding-driven. I think there are good reasons to think that it could plausibly transition to being more understanding driven when the researchers become superhuman, because interp, agent foundations, GOFAI etc haven’t made zero progress and don’t seem fundamentally impossible to me. And if capabilities research becomes loaded on understanding very complicated things, then it could become extremely dependent on quite how capable the most capable researchers are in a way that can’t easily be substituted for by more human-level researchers.
Suppose I take a smart human and give them the ability/bandwidth to memorise and understand the entire internet. That person would be really different to any normal human, and also really different to any group of humans. So when they try to do research, they approach the tree of ideas to pick the low hanging fruit from a different direction to all of society’s research efforts beforehand, so it seems possible that from their perspective there is a lot of low hanging fruit left on the tree — lots of things that seem easy from their vantage point and nearly impossible to grasp from our perspective[2]. And, research into how much diminishing returns we’ve seen to ideas in the field is not useful for predicting how much research progress that enhanced human would make in their first year.
It seems hard to know quite how many angles of approach there are on the tree of ideas, but it seems possible to me that on more than one occasion when you build a new AI that is now the most intelligent being in the world, it starts doing research and finds many ideas that are easy for it and near impossible for all the beings in the world that came before it.
or at least only by an extremely large number of humans, who are doing something more like brute force search and less like thinking
This is basically the same idea as Dwarkesh’s point that a human-level LLM should be able to make all sorts of new discoveries by connecting dots that humans can’t connect because we can’t read and take in the whole internet.
Great question—thanks!
I agree that a large fraction, and probably most, of the progress here will be driven by increases in qualitative intelligence, rather than running more parallel copies of having copies think faster. Especially because we will be relatively “rich” in parallel copies during this period, so may hitting sharper DMR to doubling the parallel copies even further than we have hit today.
I do try to estimate the size of the effect of having qualitatively smarter researchers and incorporate it into the model.
Basically, I look at data for how a marginal increase in qualitative capability increases productivity today, and translate that into an equivalent marginal increase in parallel copies. So while it might be impossible for any number of average-capability ppl to make as much progress as one super-genius, we can ask how many average-capability ppl would make similarly fast progress to one slightly-above-average-capability person. So maybe we find that one standard deviation increase in productivity within the human distribution is equivalent to increasing the number of parallel researchers by 3X, which we think speeds up the pace of progress by 1.5X.
Then I chain that forward through to higher capability levels. I.e. I assume that every standard deviation increase will continue to speed up the pace of progress by 1.5X. Now eventually we’ll be considering the super-genius, and we think perhaps that he can speed up progress by 10X, and we’ll think that (in reality) no number of parallel average-capability ppl could have done a similarly-sized speed-up. But that’s all ok as long as our two core assumptions (the first standard deviation speeds up progress by 1.5X, and other standard deviations are the same) hold.
(In the model itself, we translate all this into parallel equivalents. But the model also assumes that the DMR to adding more parallel copies stays constant even as the number of “parallel copies” becomes absurdly high. In this example, it assumes that each time you 3X the number of “parallel copies”, the pace of progress increases by 1.5X, no matter how many parallel copies you already have. Now, this assumption is unrealistic in the extreme when we’re considering literal parallel copies. But it’s actually an appropriate assumption to make when most of our “parallel copies” are actually coming from more qualitative capability and we expect the effects of increasing qualitative capability to stay constant. So I think the model makes two unrealistic assumptions that (hopefully!) roughly cancel out in expectation. First, it translates qualitative capability gains to parallel copies. Second, it assumes that adding more parallel copies continues to be helpful without limit. But that’s ok, as long as qualitative capability gains continue to be helpful without limit.)
I just calculated what my median parameters imply about this the productivity gains to marginal more qualitative capability. (FYI, I link to my data sources for calculating the marginal impact of more capable models—search for “upwards for improving capabilities”.) I’m implicitly assuming that a 10X increase in effective training compute increases the pace of researcher progress by 4X. (See sheet.)
In the same sheet, I also just did a new calculation of the same quantity by using 1) AI-2027′s survey of lab researchers to estimate the productivity gain from replacing median researchers with experts, and 2) Ryan Greenblatt’s estimate of how much effective compute is needed to increase capability from a worse to better human expert. This produced only 2.2X—and it might be higher if all the gains were used to increase qualiatively capability (Ryan imagines some being used for more copies and faster thinking) (Edited to add: I initially though this was only 1.5X, which was out of whack with my assumptions).
That’s a fairly big discrepancy, suggesting I may have giventoo muchweight to qualitative capability increases. Though combined with my other assumptions this calc implies that, on the current margin, doubling effective training compute and running the same number of copies would beworsethan running 2X as many parallel copies, which seems wrong (seecell D9).Anyway, that’s all to say that your question is a great one, and I’ve tried to incorporate the key consideration you’re pointing to, but as you predicted it’s hard to model well and so this is a big source of uncertainty in the model. It enters into my estimate of r.
This is an interesting argument for thinking that the returns to intelligence as you move above the human range will be bigger than the returns within that range. In principle you could measure that by considering the humans who are literally the very best. Or by seeing how the returns to intelligence change as you approach that person. As you hone in on the most intelligent person, they should increasingly benefit from this “there’s no one else to pluck the low-hanging fruit” effect.
And my guess is that doing this would tend to increase the estimated effect here. E.g. AI-2027′s survey compared the median to the top expert. But maybe if it had compared the 10th best to the very best, the predicted speed-up would have been similar, and so i’d have calculated a more intense speed-up per OOM of extra effective compute. (More than the 2.2X per OOM that I in fact calculated.)
From a skim, seems you should be using the 6.25x value rather than the 2.5x in B2 of your sheet. If I’m skimming it correctly, 6.25x is the estimate for replacing a hypothetical all median lab with a hypothetical all top researcher lab. This is what occurs when you improve your ASARA model. Whereas, 2.5x is the estimate for replacing the actual lab with an all top lab.
This still gives a lower than 4x value, but I think if you plug in reasonable log-normals 4x will be within your 90% CI, and so it seems fine.
Thanks, great catch. Corrected this.
And i realise Ryan’s seemingly assuming we only use some of the gains for better qualitative capabilities. So that would further reduce the discrepancy.