My take is that the big algorithmic difference that explains a lot of weird LLM deficits, and plausibly explains the post’s findings, is that current neural networks do not learn at run-time, instead their weights are frozen, and this explains a central difference of why humans are able to outperform LLMs at longer tasks, because humans have the ability to learn at run-time, as do a lot of other animals.
Unfortunately, this ability is generally lost gradually starting in your 20s, but still the existence of non-trivial learning at runtime is a huge explainer of why humans are more successful at longer tasks than AIs currently are.
And thus if OpenAI or Anthropic found this secret to life long learning, this would explain the hype (though I personally place very low probability that they succeeded on this for anything that isn’t math or coding/software).
My take is that the big algorithmic difference that explains a lot of weird LLM deficits, and plausibly explains the post’s findings, is that current neural networks do not learn at run-time, instead their weights are frozen, and this explains a central difference of why humans are able to outperform LLMs at longer tasks, because humans have the ability to learn at run-time, as do a lot of other animals.
Unfortunately, this ability is generally lost gradually starting in your 20s, but still the existence of non-trivial learning at runtime is a huge explainer of why humans are more successful at longer tasks than AIs currently are.
And thus if OpenAI or Anthropic found this secret to life long learning, this would explain the hype (though I personally place very low probability that they succeeded on this for anything that isn’t math or coding/software).
Gwern explains below:
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks#hSkQG2N8rkKXosLEF