The point of this sentence is that it has ever been the case that it’s that simple, not to argue that from current-point-in-time we’re just waiting on the next 10x scale in training compute (we’re not). Any new paradigm is likely to create wiggle room to scale a single variable and receive returns (indeed, some engineers, and maybe many, index on ease of scalability when deciding which approaches to prioritize, since this makes things the right combination of cheap to test and highly effective).
Which no one does anymore because it doesn’t work...
The point of this sentence is that it has ever been the case that it’s that simple, not to argue that from current-point-in-time we’re just waiting on the next 10x scale in training compute (we’re not). Any new paradigm is likely to create wiggle room to scale a single variable and receive returns (indeed, some engineers, and maybe many, index on ease of scalability when deciding which approaches to prioritize, since this makes things the right combination of cheap to test and highly effective).