Yeah, this sort of thing, if it actually scales and can be adapted to other paradigms (like putting an RNN or transformers), would be the final breakthrough sufficient for AGI, because as I’ve said, one of the things that keeps LLM agents from being better is their inability to hold memory/state, which cripples meta-learning (without expensive compute investment), and this new paper is possibly a first step towards the return of recurrence/RNN architectures.
Yeah, this sort of thing, if it actually scales and can be adapted to other paradigms (like putting an RNN or transformers), would be the final breakthrough sufficient for AGI, because as I’ve said, one of the things that keeps LLM agents from being better is their inability to hold memory/state, which cripples meta-learning (without expensive compute investment), and this new paper is possibly a first step towards the return of recurrence/RNN architectures.