“inference scaling as the main surviving form of scaling ” --> But it isn’t though, RL is still a very important form of scaling. Yes, it’ll become harder to scale up RL in the near future (recently they could just allocate more of their existing compute budget to RL, but soon they’ll need to grow their compute budget) so there’ll be a slowdown from that effect, but it seems to me that the next three OOMs of RL scaling will bring at least as much benefit as the previous three OOMs of RL scaling, which was substantial as you say (largely because it ‘unlocked’ more inference compute scaling. The next 3 OOMs of RL scaling will ‘unlock’ even more.)
Re: Willingness to pay going up: Yes, that’s what I expect. I don’t think it’s hard at all. If you do a bunch of RL scaling that ‘unlocks’ more inference scaling—e.g. by extending METR-measured horizon length—then boom, now your models can do significantly longer, more complex tasks than before. Those tasks are significantly more valuable and people will be willing to pay significantly more for them.
I’m a bit confused here. Your first paragraph seems to end up agreeing with me? i.e. that RL scaling derives most of its importance from enabling inference-scaling and is dependent on it. I’m not sure we really have any disagreement there — I’m not saying people will stop doing any RL.
Re WTP, I do think it is quite hard to scale. For example consider consumer use. Many people are paying ~$1 per day for AI access (the $20/month subscriptions). If companies need to 1000x inference in order to get the equivalent of a GPT level, then consumers would need to pay ~$1000 per day, which most people won’t do (and can’t do). Indeed, I think $10 per day is about the upper limit of what we could see for most people in the nearish future (=$3,650 per year, which is much more than they pay for their computer plus phone). Maybe $30 per day, if it reaches the total cost of owning a car (still only 1.5 OOM above current prices). But I can’t really imagine it reaching that level for just the current amount of use (at higher intelligence) — I think that would only be reached if there were much more use too. Therefore, I see only 1 OOM increase in cost per query being possible here for consumer use, which means an initial 1 OOM of inference scaling after which the inference used could increase at the speed of efficiency gains (0.5 OOM per year) keeping a constant price (and meaning it absorbs the efficiency gains).
But it is different for non-consumer use-cases. Maybe there are industrial areas where it is more plausible to be willing to pay 100x or 1000x as much for the same number of queries to a somewhat more intelligent system (e.g. coding). I’m a bit skeptical though. I really think current scaling paying for itself was driven by being able to scale up the number of users and the amount of queries per API user, and these stop working here, which is a big deal.
“inference scaling as the main surviving form of scaling ” --> But it isn’t though, RL is still a very important form of scaling. Yes, it’ll become harder to scale up RL in the near future (recently they could just allocate more of their existing compute budget to RL, but soon they’ll need to grow their compute budget) so there’ll be a slowdown from that effect, but it seems to me that the next three OOMs of RL scaling will bring at least as much benefit as the previous three OOMs of RL scaling, which was substantial as you say (largely because it ‘unlocked’ more inference compute scaling. The next 3 OOMs of RL scaling will ‘unlock’ even more.)
Re: Willingness to pay going up: Yes, that’s what I expect. I don’t think it’s hard at all. If you do a bunch of RL scaling that ‘unlocks’ more inference scaling—e.g. by extending METR-measured horizon length—then boom, now your models can do significantly longer, more complex tasks than before. Those tasks are significantly more valuable and people will be willing to pay significantly more for them.
I’m a bit confused here. Your first paragraph seems to end up agreeing with me? i.e. that RL scaling derives most of its importance from enabling inference-scaling and is dependent on it. I’m not sure we really have any disagreement there — I’m not saying people will stop doing any RL.
Re WTP, I do think it is quite hard to scale. For example consider consumer use. Many people are paying ~$1 per day for AI access (the $20/month subscriptions). If companies need to 1000x inference in order to get the equivalent of a GPT level, then consumers would need to pay ~$1000 per day, which most people won’t do (and can’t do). Indeed, I think $10 per day is about the upper limit of what we could see for most people in the nearish future (=$3,650 per year, which is much more than they pay for their computer plus phone). Maybe $30 per day, if it reaches the total cost of owning a car (still only 1.5 OOM above current prices). But I can’t really imagine it reaching that level for just the current amount of use (at higher intelligence) — I think that would only be reached if there were much more use too. Therefore, I see only 1 OOM increase in cost per query being possible here for consumer use, which means an initial 1 OOM of inference scaling after which the inference used could increase at the speed of efficiency gains (0.5 OOM per year) keeping a constant price (and meaning it absorbs the efficiency gains).
But it is different for non-consumer use-cases. Maybe there are industrial areas where it is more plausible to be willing to pay 100x or 1000x as much for the same number of queries to a somewhat more intelligent system (e.g. coding). I’m a bit skeptical though. I really think current scaling paying for itself was driven by being able to scale up the number of users and the amount of queries per API user, and these stop working here, which is a big deal.