My view is that LLMs can’t do context management well that long tasks require. Training approaches rely on the model doing the whole task within it’s context window length. Long context tasks like reading large documents to answer questions are not the same as coding an entire software library from scratch.
RL training, that I know of, still involves trajectories that fit within a context window length. There is no optimization pressure for models to figure out how to do context management unlike humans who develop strategies like note-taking.
My view is that LLMs can’t do context management well that long tasks require. Training approaches rely on the model doing the whole task within it’s context window length. Long context tasks like reading large documents to answer questions are not the same as coding an entire software library from scratch.
RL training, that I know of, still involves trajectories that fit within a context window length. There is no optimization pressure for models to figure out how to do context management unlike humans who develop strategies like note-taking.