I think your discussion (and Epoch’s discussion) of the CES model is confused as you aren’t taking into account the possibility that we’re already bottlenecking on compute or labor. That is, I think you’re making some assumption about the current marginal returns which is non-obvious and, more strongly, would be an astonishing coincidence given that compute is scaling much faster than labor.
In particular, consider a hypothetical alternative world where they have the same amount of compute, but there is only 1 person (Bob) working on AI and this 1 person is as capable as the median AI company employee and also thinks 10x slower. In this alternative world they could also say “Aha, you see because ρ≈0.4, even if we had billions of superintelligences running billions of times faster than Bob, AI progress would only go up to around 4x faster!”
Of course, this view is absurd because we’re clearly operating >>4x faster than Bob.
So, you need to make some assumptions about the initial conditions.
This perspective implies an (IMO) even more damning issue with this exact modeling: the CES model is symmetric, so it also implies that additional compute (without labor) can only speed you up so much. I think the argument I’m about to explain strongly supports a lower value of ρ or some different functional form.
Consider another hypothetical world where the only compute they have is some guy with an abacus, but AI companies have the same employees they do now. In this alternative world, you could also have just as easily said “Aha, you see because ρ≈0.4, even if we had GPUs that could do 1e15 FLOP/s (far faster than our current rate of 1e-1 fp8 FLOP/s), AI progress would only go around 4x faster!”
Further, the availability of compute for AI experiments has varied by around 8 orders of magnitude over the last 13 years! (AlexNet was 13 years ago.) The equivalent (parallel) human labor focused on frontier AI R&D has varied by more like 3 or maybe 4 orders of magnitude. (And the effective quality adjusted serial labor, taking into account parallelization penalities etc, has varied by less than this, maybe by more like 2 orders of magnitude!)
Ok, but can we recover low-substitution CES? I think the only maybe consistent recovery (which doesn’t depend on insane coincidences about compute vs labor returns) would imply that compute was the bottleneck (back in AlexNet days) such that scaling up labor at the time wouldn’t yield ~any progress. Hmm, this doesn’t seem quite right.
Further, insofar as you think scaling up just labor when we had 100x less compute (3-4 years ago) would have still been able to yield some serious returns (seems obviously true to me...), then a low-substitution CES model would naively imply we have a sort of compute overhang where were we can speed things up by >100x using more labor (after all, we’ve now added a bunch of compute, so time for more labor).
Minimally, the CES view predicts that AI companies should be spending less and less of their budget on compute as GPUs are getting cheaper. (Which seems very false.)
Ok, but can we recover a view sort of like the low-substitution CES view where speeding up our current labor force by an arbitrary amount (also implying we could only use our best employees etc) would only yield ~10x faster progress? I think this view might be recoverable with some sort of non-symmetric model where we assume that labor can’t be the bottleneck in some wide regime, but compute can be the bottleneck.
(As in, you can always get faster progress by adding more compute, but the multiplier on top of this from adding labor caps out at some point which mostly doesn’t depend on how much compute you have. E.g., maybe this could be because at some point you just run bigger experiments with the same amount of labor and doing a larger numbers of smaller experiment is always worse and you can’t possibly design the experiment better. I think this sort of model is somewhat plausible.)
This model does make a somewhat crazy prediction where it implies that if you scale up compute and labor exactly in parallel, eventually further labor has no value. (I suppose this could be true, but seems a bit wild.)
Overall, I’m currently quite skeptical that arbitrary improvements in labor yield only small increases in the speed of progress. (E.g., a upper limit of 10x faster progress.)
As far as I can tell, this view either privileges exactly the level of human researchers at AI companies or implies that using only a smaller number of weaker and slower researchers wouldn’t alter the rate of progress that much.
In particular, consider a hypothetical AI company with the same resources as OpenAI except that they only employ aliens whose brains work 10x slower and for which the best researcher is roughly as good as OpenAI’s median technical employee. I think such an AI company would be much slower than OpenAI, maybe 10x slower (part from just lower serial speed and part from reduced capabilities). If you think such an AI company would be 10x slower, then by symmetry you should probably think that an AI company with 10x faster employees who are all as good as the best researchers should perhaps be 10x faster or more.[1] It would be surprising if the returns stopped at exactly the level of OpenAI researchers. And the same reasoning makes 100x speed ups once you have superhuman capabilities at massive serial speed seem very plausible.
As far as I can tell, this sort of consideration is at least somewhat damning for the literal CES model (with poor substitution) in any situation where the inputs have varied by hugely different amounts (many orders of magnitude of difference like in the compute vs labor case) and relative demand remains roughly similar. While this is totally expected under high substitution.
I don’t think this logic is quite right. In particular, the relative price of compute to labor has changed by many orders of magnitude over the same period (compute has decreased in price a lot, wages have grown). Unless compute and labor are perfect complements, you should expect the ratio of compute to labor to be changing as well.
To understand how substitutable compute and labor are, you need to see how the ratio of compute to labor is changing relative to price changes. We try to run through these numbers here.
Yeah, you can get into other fancy tricks to defend it like:
Input-specific technological progress. Even if labour has grown more slowly than capital, maybe the ‘effective labour supply’—which includes tech makes labour more productive (e.g. drinking caffeine, writing faster on a laptop) -- has grown as fast as capital.
Input-specific ‘stepping on toes’ adjustments. If capital grows at 10%/year and labour grows at 5%/year, but (effective labour) = labour^0.5, and (effective capital)=capital, then the growth rates of effective labour and effective capital are equal
I buy your core argument against the main CES model I presented. I think your key argument (“the relative quantity of labour vs compute has varied by OOMs; If CES were true, then one input would have become a hard bottleneck; but it hasn’t.”) is pretty compelling as an objection to the simple naive CES I mention in the post. It updates me even further towards thinking that, if you use this naive CES, you should have ρ> −0.2. Thanks!
The core argument is less powerful against a more realistic CES model that replaces ‘compute’ with ‘near-frontier sized experiments’. I’m less sure how strong it is as an argument against the more-plausible version of the CES where rather than inputs of cognitive labour and compute we have inputs of cognitive labour and number of near-frontier-sized experiments. (I discuss this briefly in the post.) I.e. if a lab has total compute C_t, its frontier training run takes C_f compute, and we say that a ‘near-frontier-sized’ experiment uses 1% as much compute as training a frontier model, then the number of near-frontier sized experiments that the lab could run equals E = 100* C_t / C_f
With this formulation, it’s no longer true that a key input has increased by many OOMs, which was the core of your objection (at least the part of your objection that was about the actual world rather than about hypotheticals—i discuss hypotheticals below.)
Unlike compute, E hasn’t grown by many OOMs over the last decade. How has it changed? I’d guess it’s gotten a bit smaller over the past decade as labs have scaled frontier training runs faster than they’ve scaled their total quantity of compute for running experiments. But maybe labs have scaled both at equal pace, especially as in recent years the size of pre-training has been growing more slowly (excluding GPT-4.5).
So this version of the CES hypothesis fares better against your objection, bc the relative quantity of the two inputs (cognitive labour and number of near-frontier experiments) have changed by less over the past decade. Cognitive labour inputs have grown by maybe 2 OOMs over the past decade, but the ‘effective labour supply’, adjusting for diminishing quality and stepping-on-toes effects has grown by maybe just 1 OOM. With just 1 OOM relative increase in cognitive labour, the CES function with ρ=-0.4 implies that compute will have become more of a bottleneck, but not a complete bottleneck such that more labour isn’t still useful. And that seems roughly realistic.
Minimally, the CES view predicts that AI companies should be spending less and less of their budget on compute as GPUs are getting cheaper. (Which seems very false.)
This version of the CES hypothesis also dodges this objection. AI companies need to spend much more on compute over time just to keep E constant and avoid compute becoming a bottleneck.
This model does make a somewhat crazy prediction where it implies that if you scale up compute and labor exactly in parallel, eventually further labor has no value. (I suppose this could be true, but seems a bit wild.)
Doesn’t seem that wild to me? When we scale up compute we’re also scaling up the size of frontier training runs; maybe past a certain point running smaller experiments just isn’t useful (e.g. you can’t learn anything from experiments using 1 billionth of the compute of a frontier training run); and maybe past a certain point you just can’t design better experiments. (Though I agree with you that this is all unlikely to bite before a 10X speed up.)
consider a hypothetical AI company with the same resources as OpenAI except that they only employ aliens whose brains work 10x slower and for which the best researcher is roughly as good as OpenAI’s median technical employee
Nice thought experiment.
So the near-frontier-experiment version of the CES hypothesis would say that those aliens would be in a world where experimental compute isn’t a bottleneck on AI progress at all: the aliens don’t have time to write the code to run the experiments they have the compute for! And we know we’re not in that world because experiments are a real bottleneck on our pace of progress already: researchers say they want more compute! These hypothetical aliens would make no such requests. It may be a weird empirical coincidence that cognitive labour helps up to our current level but not that much further, but we can confirm with the evidence of the marginal value of compute in our world.
But actually I do agree the CES hypothesis is pretty implausible here. More compute seems like it would still be helpful for these aliens: e.g. automated search over different architectures and running all experiments at large scale. And evolution is an example where the “cognitive labour” going into AI R&D was very very very minimal and still having lots of compute to just try stuff out helped.
So I think this alien hypothetical is probably the strongest argument against the near-frontier experiment version of the CES hypothesis. I don’t think it’s devastating—the CES-advocate can bite the bullet and claim that more compute wouldn’t be at all useful in that alien world.
(Fwiw i preferred the way you described that hypothesis before your last edit.)
You can also try to ‘block’ the idea of a 10X speed up by positing a large ‘stepping on toes’ effect. If it’s v important to do experiments in series and that experiments can’t be sped up past a certain point, then experiments could still bottleneck progress. This wouldn’t be about the quantity of compute being a bottleneck per se, so it avoids your objection. Instead the bottleneck is ‘number of experiments you can run per day’. Mathematically, you could represent this by smg like:
AI progress per week = log(1000 + L^0.5 * E^0.5)
The idea is that there are ~linear gains to research effort initially, but past a certain point returns start to diminish increasingly steeply such that you’d struggle to ever realistically 10X the pace of progress.
Ultimately, I don’t really buy this argument. If you applied this functional form to other areas of science you’d get the implication that there’s no point scaling up R&D past a certain point, which has never happened in practice. And even i think the functional form underestimates has much you could improve experiment quality and how much you could speed up experiments. And you have to cherry pick the constant so that we get a big benefit from going from the slow aliens to OAI-today, but limited benefit from going from today to ASI.
Still, this kind of serial-experiment bottleneck will apply to some extent so it seems worth highlighting that this bottleneck isn’t effected by the main counterargument you made.
I don’t think my treatment of initial conditions was confused
I think your discussion (and Epoch’s discussion) of the CES model is confused as you aren’t taking into account the possibility that we’re already bottlenecking on compute or labor...
In particular, consider a hypothetical alternative world where they have the same amount of compute, but there is only 1 person (Bob) working on AI and this 1 person is as capable as the median AI company employee and also thinks 10x slower. In this alternative world they could also say “Aha, you see because ρ≈0.4, even if we had billions of superintelligences running billions of times faster than Bob, AI progress would only go up to around 4x faster!”
Of course, this view is absurd because we’re clearly operating >>4x faster than Bob.
So, you need to make some assumptions about the initial conditions....
Consider another hypothetical world where the only compute they have is some guy with an abacus, but AI companies have the same employees they do now. In this alternative world, you could also have just as easily said “Aha, you see because ρ≈0.4, even if we had GPUs that could do 1e15 FLOP/s (far faster than our current rate of 1e-1 fp8 FLOP/s), AI progress would only go around 4x faster!”
My discussion does assume that we’re not currently bottlenecked on either compute or labour, but I think that assumption is justified. It’s quite clear that labs both want more high-quality researchers—top talent has very high salaries, reflecting large marginal value-add. It’s also clear that researchers want more compute—again reflecting large marginal value-add. So it’s seems clear that we’re not strongly bottlenecked by just one of compute or labour currently. That’s why I used α=0.5, assuming that the elasticity of progress to both inputs is equal. (I don’t think this is exactly right, but seems in the right rough ballpark.)
I think your thought experiments about the the world with just one researcher and the world with just an abacus are an interesting challenge to the CES function, but don’t imply that my treatment of initial conditions was confused.
I actually don’t find those two examples very convincing though as challenges to the CES. In both those worlds it seems pretty plausible that the scarce input would be a hard bottleneck on progress. If all you have is an abacus, then probably the value of the marginal AI researcher would be ~0 as they’d have no compute to use. And so you couldn’t run the argument “ρ= − 0.4 and so more compute won’t help much” because in that world (unlike our world) it will be very clear that compute is a hard bottleneck to progress and cognitive labour isn’t helpful. And similarly, in the world with just Bob doing AI R&D, it’s plausible that AI companies would have ~0 willingness to pay for more compute for experiments, as Bob can’t use the compute that he’s already got; labour is the hard bottleneck. So again you couldn’t run the argument based on ρ=-0.4 bc that argument only works if neither input is currently a hard bottleneck.
It’s quite clear that labs both want more high-quality researchers—top talent has very high salaries, reflecting large marginal value-add.
Three objections, one obvious. I’ll state them strongly, a bit devil’s advocate; not sure where I actually land on these things.
Obvious: salaries aren’t that high.
Also, I model a large part of the value to companies of legible credentialed talent being the marketing value to VCs and investors, who (even if lab leadership can) can’t tell talent apart except by (rare) legible signs. This is actually a way to get more compute (and other capital). (The legible signs are rare because compute is a bottleneck! So a Matthew effect pertains.)
Finally, the utility of labs is very convex in the production of AI: the actual profit comes from time spent selling a non-commoditised frontier offering at large margin. So small AI production speed gains translate into large profit gains.
I also noticed this! And wondered how much evidence it is (and of what). I don’t think of Meta as especially rational in its AI-related behaviour. Maybe this is Zuckerberg trying to make up for Le Cun’s years of bad judgement.
Hmm, I actually kind of lean towards it being rational, and labs just underspending on labor vs. capital for contigent historical/cultural reasons. I do think a lot of the talent juice is in “banal” progress like efficiently running lots of experiments, and iterating on existing ideas straightforwardly (as opposed to something like “only a few people have the deep brilliance/insight to make progress”), but that doesn’t change the upshot IMO.
Doesn’t seem that wild to me? When we scale up compute we’re also scaling up the size of frontier training runs; maybe past a certain point running smaller experiments just isn’t useful (e.g. you can’t learn anything from experiments using 1 billionth of the compute of a frontier training run); and maybe past a certain point you just can’t design better experiments. (Though I agree with you that this is all unlikely to bite before a 10X speed up.)
Yes, but also, if the computers are getting serially faster, then you also have to be able to respond to the results and implement the next experiment faster as you add more compute. E.g., imagine a (physically implausible) computer which can run any experiment which uses less than 1e100 FLOP in less than a nanosecond. To maximally utilize this, you’d want to be able to respond to results and implement the next experiment in less than a nanosecond as well. This is of course an unhinged hypothetical and in this world, you’d also be able to immediately create superintelligence by e.g. simulating a huge evolutionary process.
I think your discussion (and Epoch’s discussion) of the CES model is confused as you aren’t taking into account the possibility that we’re already bottlenecking on compute or labor. That is, I think you’re making some assumption about the current marginal returns which is non-obvious and, more strongly, would be an astonishing coincidence given that compute is scaling much faster than labor.
In particular, consider a hypothetical alternative world where they have the same amount of compute, but there is only 1 person (Bob) working on AI and this 1 person is as capable as the median AI company employee and also thinks 10x slower. In this alternative world they could also say “Aha, you see because ρ≈0.4, even if we had billions of superintelligences running billions of times faster than Bob, AI progress would only go up to around 4x faster!”
Of course, this view is absurd because we’re clearly operating >>4x faster than Bob.
So, you need to make some assumptions about the initial conditions.
This perspective implies an (IMO) even more damning issue with this exact modeling: the CES model is symmetric, so it also implies that additional compute (without labor) can only speed you up so much. I think the argument I’m about to explain strongly supports a lower value of ρ or some different functional form.
Consider another hypothetical world where the only compute they have is some guy with an abacus, but AI companies have the same employees they do now. In this alternative world, you could also have just as easily said “Aha, you see because ρ≈0.4, even if we had GPUs that could do 1e15 FLOP/s (far faster than our current rate of 1e-1 fp8 FLOP/s), AI progress would only go around 4x faster!”
Further, the availability of compute for AI experiments has varied by around 8 orders of magnitude over the last 13 years! (AlexNet was 13 years ago.) The equivalent (parallel) human labor focused on frontier AI R&D has varied by more like 3 or maybe 4 orders of magnitude. (And the effective quality adjusted serial labor, taking into account parallelization penalities etc, has varied by less than this, maybe by more like 2 orders of magnitude!)
Ok, but can we recover low-substitution CES? I think the only maybe consistent recovery (which doesn’t depend on insane coincidences about compute vs labor returns) would imply that compute was the bottleneck (back in AlexNet days) such that scaling up labor at the time wouldn’t yield ~any progress. Hmm, this doesn’t seem quite right.
Further, insofar as you think scaling up just labor when we had 100x less compute (3-4 years ago) would have still been able to yield some serious returns (seems obviously true to me...), then a low-substitution CES model would naively imply we have a sort of compute overhang where were we can speed things up by >100x using more labor (after all, we’ve now added a bunch of compute, so time for more labor).
Minimally, the CES view predicts that AI companies should be spending less and less of their budget on compute as GPUs are getting cheaper. (Which seems very false.)
Ok, but can we recover a view sort of like the low-substitution CES view where speeding up our current labor force by an arbitrary amount (also implying we could only use our best employees etc) would only yield ~10x faster progress? I think this view might be recoverable with some sort of non-symmetric model where we assume that labor can’t be the bottleneck in some wide regime, but compute can be the bottleneck.
(As in, you can always get faster progress by adding more compute, but the multiplier on top of this from adding labor caps out at some point which mostly doesn’t depend on how much compute you have. E.g., maybe this could be because at some point you just run bigger experiments with the same amount of labor and doing a larger numbers of smaller experiment is always worse and you can’t possibly design the experiment better. I think this sort of model is somewhat plausible.)
This model does make a somewhat crazy prediction where it implies that if you scale up compute and labor exactly in parallel, eventually further labor has no value. (I suppose this could be true, but seems a bit wild.)
Overall, I’m currently quite skeptical that arbitrary improvements in labor yield only small increases in the speed of progress. (E.g., a upper limit of 10x faster progress.)
As far as I can tell, this view either privileges exactly the level of human researchers at AI companies or implies that using only a smaller number of weaker and slower researchers wouldn’t alter the rate of progress that much.
In particular, consider a hypothetical AI company with the same resources as OpenAI except that they only employ aliens whose brains work 10x slower and for which the best researcher is roughly as good as OpenAI’s median technical employee. I think such an AI company would be much slower than OpenAI, maybe 10x slower (part from just lower serial speed and part from reduced capabilities). If you think such an AI company would be 10x slower, then by symmetry you should probably think that an AI company with 10x faster employees who are all as good as the best researchers should perhaps be 10x faster or more.[1] It would be surprising if the returns stopped at exactly the level of OpenAI researchers. And the same reasoning makes 100x speed ups once you have superhuman capabilities at massive serial speed seem very plausible.
I edited this to be a hopefully more clear description.
As far as I can tell, this sort of consideration is at least somewhat damning for the literal CES model (with poor substitution) in any situation where the inputs have varied by hugely different amounts (many orders of magnitude of difference like in the compute vs labor case) and relative demand remains roughly similar. While this is totally expected under high substitution.
I don’t think this logic is quite right. In particular, the relative price of compute to labor has changed by many orders of magnitude over the same period (compute has decreased in price a lot, wages have grown). Unless compute and labor are perfect complements, you should expect the ratio of compute to labor to be changing as well.
To understand how substitutable compute and labor are, you need to see how the ratio of compute to labor is changing relative to price changes. We try to run through these numbers here.
Yeah, you can get into other fancy tricks to defend it like:
Input-specific technological progress. Even if labour has grown more slowly than capital, maybe the ‘effective labour supply’—which includes tech makes labour more productive (e.g. drinking caffeine, writing faster on a laptop) -- has grown as fast as capital.
Input-specific ‘stepping on toes’ adjustments. If capital grows at 10%/year and labour grows at 5%/year, but (effective labour) = labour^0.5, and (effective capital)=capital, then the growth rates of effective labour and effective capital are equal
Thanks, this is a great comment.
I buy your core argument against the main CES model I presented. I think your key argument (“the relative quantity of labour vs compute has varied by OOMs; If CES were true, then one input would have become a hard bottleneck; but it hasn’t.”) is pretty compelling as an objection to the simple naive CES I mention in the post. It updates me even further towards thinking that, if you use this naive CES, you should have ρ> −0.2. Thanks!
The core argument is less powerful against a more realistic CES model that replaces ‘compute’ with ‘near-frontier sized experiments’. I’m less sure how strong it is as an argument against the more-plausible version of the CES where rather than inputs of cognitive labour and compute we have inputs of cognitive labour and number of near-frontier-sized experiments. (I discuss this briefly in the post.) I.e. if a lab has total compute C_t, its frontier training run takes C_f compute, and we say that a ‘near-frontier-sized’ experiment uses 1% as much compute as training a frontier model, then the number of near-frontier sized experiments that the lab could run equals E = 100* C_t / C_f
With this formulation, it’s no longer true that a key input has increased by many OOMs, which was the core of your objection (at least the part of your objection that was about the actual world rather than about hypotheticals—i discuss hypotheticals below.)
Unlike compute, E hasn’t grown by many OOMs over the last decade. How has it changed? I’d guess it’s gotten a bit smaller over the past decade as labs have scaled frontier training runs faster than they’ve scaled their total quantity of compute for running experiments. But maybe labs have scaled both at equal pace, especially as in recent years the size of pre-training has been growing more slowly (excluding GPT-4.5).
So this version of the CES hypothesis fares better against your objection, bc the relative quantity of the two inputs (cognitive labour and number of near-frontier experiments) have changed by less over the past decade. Cognitive labour inputs have grown by maybe 2 OOMs over the past decade, but the ‘effective labour supply’, adjusting for diminishing quality and stepping-on-toes effects has grown by maybe just 1 OOM. With just 1 OOM relative increase in cognitive labour, the CES function with ρ=-0.4 implies that compute will have become more of a bottleneck, but not a complete bottleneck such that more labour isn’t still useful. And that seems roughly realistic.
This version of the CES hypothesis also dodges this objection. AI companies need to spend much more on compute over time just to keep E constant and avoid compute becoming a bottleneck.
Doesn’t seem that wild to me? When we scale up compute we’re also scaling up the size of frontier training runs; maybe past a certain point running smaller experiments just isn’t useful (e.g. you can’t learn anything from experiments using 1 billionth of the compute of a frontier training run); and maybe past a certain point you just can’t design better experiments. (Though I agree with you that this is all unlikely to bite before a 10X speed up.)
Nice thought experiment.
So the near-frontier-experiment version of the CES hypothesis would say that those aliens would be in a world where experimental compute isn’t a bottleneck on AI progress at all: the aliens don’t have time to write the code to run the experiments they have the compute for! And we know we’re not in that world because experiments are a real bottleneck on our pace of progress already: researchers say they want more compute! These hypothetical aliens would make no such requests. It may be a weird empirical coincidence that cognitive labour helps up to our current level but not that much further, but we can confirm with the evidence of the marginal value of compute in our world.
But actually I do agree the CES hypothesis is pretty implausible here. More compute seems like it would still be helpful for these aliens: e.g. automated search over different architectures and running all experiments at large scale. And evolution is an example where the “cognitive labour” going into AI R&D was very very very minimal and still having lots of compute to just try stuff out helped.
So I think this alien hypothetical is probably the strongest argument against the near-frontier experiment version of the CES hypothesis. I don’t think it’s devastating—the CES-advocate can bite the bullet and claim that more compute wouldn’t be at all useful in that alien world.
(Fwiw i preferred the way you described that hypothesis before your last edit.)
You can also try to ‘block’ the idea of a 10X speed up by positing a large ‘stepping on toes’ effect. If it’s v important to do experiments in series and that experiments can’t be sped up past a certain point, then experiments could still bottleneck progress. This wouldn’t be about the quantity of compute being a bottleneck per se, so it avoids your objection. Instead the bottleneck is ‘number of experiments you can run per day’. Mathematically, you could represent this by smg like:
AI progress per week = log(1000 + L^0.5 * E^0.5)
The idea is that there are ~linear gains to research effort initially, but past a certain point returns start to diminish increasingly steeply such that you’d struggle to ever realistically 10X the pace of progress.
Ultimately, I don’t really buy this argument. If you applied this functional form to other areas of science you’d get the implication that there’s no point scaling up R&D past a certain point, which has never happened in practice. And even i think the functional form underestimates has much you could improve experiment quality and how much you could speed up experiments. And you have to cherry pick the constant so that we get a big benefit from going from the slow aliens to OAI-today, but limited benefit from going from today to ASI.
Still, this kind of serial-experiment bottleneck will apply to some extent so it seems worth highlighting that this bottleneck isn’t effected by the main counterargument you made.
I don’t think my treatment of initial conditions was confused
My discussion does assume that we’re not currently bottlenecked on either compute or labour, but I think that assumption is justified. It’s quite clear that labs both want more high-quality researchers—top talent has very high salaries, reflecting large marginal value-add. It’s also clear that researchers want more compute—again reflecting large marginal value-add. So it’s seems clear that we’re not strongly bottlenecked by just one of compute or labour currently. That’s why I used α=0.5, assuming that the elasticity of progress to both inputs is equal. (I don’t think this is exactly right, but seems in the right rough ballpark.)
I think your thought experiments about the the world with just one researcher and the world with just an abacus are an interesting challenge to the CES function, but don’t imply that my treatment of initial conditions was confused.
I actually don’t find those two examples very convincing though as challenges to the CES. In both those worlds it seems pretty plausible that the scarce input would be a hard bottleneck on progress. If all you have is an abacus, then probably the value of the marginal AI researcher would be ~0 as they’d have no compute to use. And so you couldn’t run the argument “ρ= − 0.4 and so more compute won’t help much” because in that world (unlike our world) it will be very clear that compute is a hard bottleneck to progress and cognitive labour isn’t helpful. And similarly, in the world with just Bob doing AI R&D, it’s plausible that AI companies would have ~0 willingness to pay for more compute for experiments, as Bob can’t use the compute that he’s already got; labour is the hard bottleneck. So again you couldn’t run the argument based on ρ=-0.4 bc that argument only works if neither input is currently a hard bottleneck.
Three objections, one obvious. I’ll state them strongly, a bit devil’s advocate; not sure where I actually land on these things.
Obvious: salaries aren’t that high.
Also, I model a large part of the value to companies of legible credentialed talent being the marketing value to VCs and investors, who (even if lab leadership can) can’t tell talent apart except by (rare) legible signs. This is actually a way to get more compute (and other capital). (The legible signs are rare because compute is a bottleneck! So a Matthew effect pertains.)
Finally, the utility of labs is very convex in the production of AI: the actual profit comes from time spent selling a non-commoditised frontier offering at large margin. So small AI production speed gains translate into large profit gains.
Salaries have indeed now gotten pretty high—it seems like they’re within an OOM of compute spend (at least at Meta).
I also noticed this! And wondered how much evidence it is (and of what). I don’t think of Meta as especially rational in its AI-related behaviour. Maybe this is Zuckerberg trying to make up for Le Cun’s years of bad judgement.
Hmm, I actually kind of lean towards it being rational, and labs just underspending on labor vs. capital for contigent historical/cultural reasons. I do think a lot of the talent juice is in “banal” progress like efficiently running lots of experiments, and iterating on existing ideas straightforwardly (as opposed to something like “only a few people have the deep brilliance/insight to make progress”), but that doesn’t change the upshot IMO.
Yes, but also, if the computers are getting serially faster, then you also have to be able to respond to the results and implement the next experiment faster as you add more compute. E.g., imagine a (physically implausible) computer which can run any experiment which uses less than 1e100 FLOP in less than a nanosecond. To maximally utilize this, you’d want to be able to respond to results and implement the next experiment in less than a nanosecond as well. This is of course an unhinged hypothetical and in this world, you’d also be able to immediately create superintelligence by e.g. simulating a huge evolutionary process.