EDIT: I think that I miscommunicated a bit initially and suggest reading my response to Vanessa before this comment for necessary context.
I hesitate to call this a mesa-optimizer. Although good epistemics involves agency in principle (especially time-bounded epistemics), I think we can sensibly differentiate between mesa-optimizers and mere mesa-induction. But perhaps you intended this stronger reading, in support of your argument. If so, I’m not sure why you believe this. (No, I don’t find “planning ahead” results to be convincing—I feel this can still be purely epistemic in a relevant sense.)
I am fine with using the term mesa-induction. I think induction is a restricted type of optimization, but I suppose you associate the term mesa-optimizer with agency, and that is not my intended message.
I think there are neural architectures close to the current paradigm which don’t directly train whole chains-of-thought on a reinforcement signal to achieve agenticness. This paradigm is analogous to model-free reinforcement learning. What I would suggest is more analogous to model-based reinforcement learning, with corresponding benefits to transparency. (Super speculative, of course.)
I don’t think the chain of thought is necessary, but routing through pure sequence prediction in some fashion seems important for the current paradigm (that is what I call scaffolding). I expect that it is possible in principle to avoid this and do straight model-based RL, but forcing that approach to quickly catch up with LLMs / foundation models seems very hard and not necessarily desirable. In fact by default this seems bad for transparency, but perhaps some IB-inspired architecture is more transparent.
EDIT: I think that I miscommunicated a bit initially and suggest reading my response to Vanessa before this comment for necessary context.
I am fine with using the term mesa-induction. I think induction is a restricted type of optimization, but I suppose you associate the term mesa-optimizer with agency, and that is not my intended message.
I don’t think the chain of thought is necessary, but routing through pure sequence prediction in some fashion seems important for the current paradigm (that is what I call scaffolding). I expect that it is possible in principle to avoid this and do straight model-based RL, but forcing that approach to quickly catch up with LLMs / foundation models seems very hard and not necessarily desirable. In fact by default this seems bad for transparency, but perhaps some IB-inspired architecture is more transparent.