I do think that progress will slow down, though its not my main claim. My main claim is to do with the tailwind of compute scaling will become weaker (unless some new scaling paradigm appears or a breakthrough saves this one). That is a piece in the puzzle of whether overall AI progress will accelerate or decelerate and I’d ideally let people form their own judgments about the other pieces (e.g. whether recursive self improvement will work, or whether funding will collapse in a market correction, taking away another tailwind of progress). But having a major boost to AI progress (compute scaling) become less of a boost is definitely some kind of an update towards lower AI progress than you were otherwise expecting.
Part of the issue with inference scaling as the main surviving form of scaling depends on how many more OOMs are needed. If it is 100x, there isn’t so much impact. If we need to 1,000x or 1,000,000x it from here, it is more of an issue.
In that prior piece I talked about inference-scaling as a flow of costs, but it also scales with things beyond time:
costs grow in proportion to time (can’t make up the costs by longer use before the new model)
costs grow in proportion to number of users (can’t make up the costs through market expansion)
costs grow in proportion to the amount of use by each user (can’t make up costs through intensity of use)
This is a big deal. If you want to 100x the price of inference going into each query, how can you make that up and still be profitable? I think you need to 100x the willingness-to-pay from each user for each query. That is very hard. My guess is that the WTP doesn’t scale with inference compute in this way, and thus that inference can only be 10x-ed when algorithmic efficiency gains and falling chip costs have divided the cost per token by 10. So I think that while previous rounds of training compute scaling could pay for themselves in the marketplace, I think that will stop for most users soon, and for specialist users a bit later.
The idea here is that the changing character of scaling affects the business model, making it so that it is no longer self-propelling to keep scaling, and that this will mean the compute scaling basically stops.
PS Thanks for pointing out that second quote “Now that RL-training…” — I think that does come across a bit stronger than I intended.
“inference scaling as the main surviving form of scaling ” --> But it isn’t though, RL is still a very important form of scaling. Yes, it’ll become harder to scale up RL in the near future (recently they could just allocate more of their existing compute budget to RL, but soon they’ll need to grow their compute budget) so there’ll be a slowdown from that effect, but it seems to me that the next three OOMs of RL scaling will bring at least as much benefit as the previous three OOMs of RL scaling, which was substantial as you say (largely because it ‘unlocked’ more inference compute scaling. The next 3 OOMs of RL scaling will ‘unlock’ even more.)
Re: Willingness to pay going up: Yes, that’s what I expect. I don’t think it’s hard at all. If you do a bunch of RL scaling that ‘unlocks’ more inference scaling—e.g. by extending METR-measured horizon length—then boom, now your models can do significantly longer, more complex tasks than before. Those tasks are significantly more valuable and people will be willing to pay significantly more for them.
I do think that progress will slow down, though its not my main claim. My main claim is to do with the tailwind of compute scaling will become weaker (unless some new scaling paradigm appears or a breakthrough saves this one). That is a piece in the puzzle of whether overall AI progress will accelerate or decelerate and I’d ideally let people form their own judgments about the other pieces (e.g. whether recursive self improvement will work, or whether funding will collapse in a market correction, taking away another tailwind of progress). But having a major boost to AI progress (compute scaling) become less of a boost is definitely some kind of an update towards lower AI progress than you were otherwise expecting.
Part of the issue with inference scaling as the main surviving form of scaling depends on how many more OOMs are needed. If it is 100x, there isn’t so much impact. If we need to 1,000x or 1,000,000x it from here, it is more of an issue.
In that prior piece I talked about inference-scaling as a flow of costs, but it also scales with things beyond time:
costs grow in proportion to time (can’t make up the costs by longer use before the new model)
costs grow in proportion to number of users (can’t make up the costs through market expansion)
costs grow in proportion to the amount of use by each user (can’t make up costs through intensity of use)
This is a big deal. If you want to 100x the price of inference going into each query, how can you make that up and still be profitable? I think you need to 100x the willingness-to-pay from each user for each query. That is very hard. My guess is that the WTP doesn’t scale with inference compute in this way, and thus that inference can only be 10x-ed when algorithmic efficiency gains and falling chip costs have divided the cost per token by 10. So I think that while previous rounds of training compute scaling could pay for themselves in the marketplace, I think that will stop for most users soon, and for specialist users a bit later.
The idea here is that the changing character of scaling affects the business model, making it so that it is no longer self-propelling to keep scaling, and that this will mean the compute scaling basically stops.
PS
Thanks for pointing out that second quote “Now that RL-training…” — I think that does come across a bit stronger than I intended.
“inference scaling as the main surviving form of scaling ” --> But it isn’t though, RL is still a very important form of scaling. Yes, it’ll become harder to scale up RL in the near future (recently they could just allocate more of their existing compute budget to RL, but soon they’ll need to grow their compute budget) so there’ll be a slowdown from that effect, but it seems to me that the next three OOMs of RL scaling will bring at least as much benefit as the previous three OOMs of RL scaling, which was substantial as you say (largely because it ‘unlocked’ more inference compute scaling. The next 3 OOMs of RL scaling will ‘unlock’ even more.)
Re: Willingness to pay going up: Yes, that’s what I expect. I don’t think it’s hard at all. If you do a bunch of RL scaling that ‘unlocks’ more inference scaling—e.g. by extending METR-measured horizon length—then boom, now your models can do significantly longer, more complex tasks than before. Those tasks are significantly more valuable and people will be willing to pay significantly more for them.