I wonder how much we need to worry about hybrid architectures. If LLMs do text generation well and continuous learning models do spatial reasoning well, and someone figures out an architecture that lets their strengths synergize with each other...
That is the basic idea behind Energy based transformers and test time training!
I wonder how much we need to worry about hybrid architectures. If LLMs do text generation well and continuous learning models do spatial reasoning well, and someone figures out an architecture that lets their strengths synergize with each other...
That is the basic idea behind Energy based transformers and test time training!