Another useful section for people to notice, which says that even if the prior is doing most of the work, the marginal sample efficiency for AIs is also very bad, because compared to humans they need 100x more marginal data:
Even if it were the case that we can explain away the trillions of tokens required to pretrain a base model as catching up to evolution, it doesn’t explain why the marginal capabilities take so much data—once you have been educated, you don’t need 100 different professors to learn a new programming language, but the AIs (even once pretrained) do.
Another useful section for people to notice, which says that even if the prior is doing most of the work, the marginal sample efficiency for AIs is also very bad, because compared to humans they need 100x more marginal data:
Even if it were the case that we can explain away the trillions of tokens required to pretrain a base model as catching up to evolution, it doesn’t explain why the marginal capabilities take so much data—once you have been educated, you don’t need 100 different professors to learn a new programming language, but the AIs (even once pretrained) do.
I have not seen any research backing this claim up, have you?