RL will be good enough to turn LLMs into reliable tools for some fixed environments/tasks. They will reliably fall flat on their faces if moved outside those environments/tasks.
They don’t have to “move outside those tasks” if they can be JIT-trained for cheap. It is the outer system that requests and produces them is general (or, one might say, “specialized in adaptation”).
They don’t have to “move outside those tasks” if they can be JIT-trained for cheap. It is the outer system that requests and produces them is general (or, one might say, “specialized in adaptation”).