There isn’t enough samples in “human training data” to train them for agentic long horizon tasks either. There is, however, enough of those samples in the human evolutionary history. Which is how humans have an instinctual grasp of how to accomplish long term goals—which forms a foundation that any further training then builds upon.
LLMs don’t get that “for free”, and they struggle to learn it from random pre-training text.
Normally, the AI answer to “no hardwired priors” is “reconstruct them the hard way from the vast dataset”. But there’s no good “in-domain dataset of long term agentic behavior” for LLMs to be able to do it well. So they get stuck with dregs of long term agency that the mix of pre-training, SFT and RLVR left them with.
Hardwired priors. Humans get them, LLMs don’t.
There isn’t enough samples in “human training data” to train them for agentic long horizon tasks either. There is, however, enough of those samples in the human evolutionary history. Which is how humans have an instinctual grasp of how to accomplish long term goals—which forms a foundation that any further training then builds upon.
LLMs don’t get that “for free”, and they struggle to learn it from random pre-training text.
Normally, the AI answer to “no hardwired priors” is “reconstruct them the hard way from the vast dataset”. But there’s no good “in-domain dataset of long term agentic behavior” for LLMs to be able to do it well. So they get stuck with dregs of long term agency that the mix of pre-training, SFT and RLVR left them with.