There are architectures which have constant memory and compute usage per token, but they are not used in practice. Ultimately I expect something like this to work; the human brain is an existence proof.
There are architectures which have constant memory and compute usage per token, but they are not used in practice. Ultimately I expect something like this to work; the human brain is an existence proof.