I’m a bit confused here. Your first paragraph seems to end up agreeing with me? i.e. that RL scaling derives most of its importance from enabling inference-scaling and is dependent on it. I’m not sure we really have any disagreement there — I’m not saying people will stop doing any RL.
Re WTP, I do think it is quite hard to scale. For example consider consumer use. Many people are paying ~$1 per day for AI access (the $20/month subscriptions). If companies need to 1000x inference in order to get the equivalent of a GPT level, then consumers would need to pay ~$1000 per day, which most people won’t do (and can’t do). Indeed, I think $10 per day is about the upper limit of what we could see for most people in the nearish future (=$3,650 per year, which is much more than they pay for their computer plus phone). Maybe $30 per day, if it reaches the total cost of owning a car (still only 1.5 OOM above current prices). But I can’t really imagine it reaching that level for just the current amount of use (at higher intelligence) — I think that would only be reached if there were much more use too. Therefore, I see only 1 OOM increase in cost per query being possible here for consumer use, which means an initial 1 OOM of inference scaling after which the inference used could increase at the speed of efficiency gains (0.5 OOM per year) keeping a constant price (and meaning it absorbs the efficiency gains).
But it is different for non-consumer use-cases. Maybe there are industrial areas where it is more plausible to be willing to pay 100x or 1000x as much for the same number of queries to a somewhat more intelligent system (e.g. coding). I’m a bit skeptical though. I really think current scaling paying for itself was driven by being able to scale up the number of users and the amount of queries per API user, and these stop working here, which is a big deal.
I’m a bit confused here. Your first paragraph seems to end up agreeing with me? i.e. that RL scaling derives most of its importance from enabling inference-scaling and is dependent on it. I’m not sure we really have any disagreement there — I’m not saying people will stop doing any RL.
Re WTP, I do think it is quite hard to scale. For example consider consumer use. Many people are paying ~$1 per day for AI access (the $20/month subscriptions). If companies need to 1000x inference in order to get the equivalent of a GPT level, then consumers would need to pay ~$1000 per day, which most people won’t do (and can’t do). Indeed, I think $10 per day is about the upper limit of what we could see for most people in the nearish future (=$3,650 per year, which is much more than they pay for their computer plus phone). Maybe $30 per day, if it reaches the total cost of owning a car (still only 1.5 OOM above current prices). But I can’t really imagine it reaching that level for just the current amount of use (at higher intelligence) — I think that would only be reached if there were much more use too. Therefore, I see only 1 OOM increase in cost per query being possible here for consumer use, which means an initial 1 OOM of inference scaling after which the inference used could increase at the speed of efficiency gains (0.5 OOM per year) keeping a constant price (and meaning it absorbs the efficiency gains).
But it is different for non-consumer use-cases. Maybe there are industrial areas where it is more plausible to be willing to pay 100x or 1000x as much for the same number of queries to a somewhat more intelligent system (e.g. coding). I’m a bit skeptical though. I really think current scaling paying for itself was driven by being able to scale up the number of users and the amount of queries per API user, and these stop working here, which is a big deal.