There are two key subquestions here: the scaling function of better at X with respect to net training compute, and what exactly X entails.
The X here is ‘predict internet text’, not “generate new highly valuable research etc”, and success at the latter likely requires combining LLMs with at least planning/search.
There are two key subquestions here: the scaling function of better at X with respect to net training compute, and what exactly X entails.
The X here is ‘predict internet text’, not “generate new highly valuable research etc”, and success at the latter likely requires combining LLMs with at least planning/search.