The pre-training phase is already finding a mesa-optimizer that does induction in context. I usually think of this as something like Solomonoff induction with a good inductive bias, but probably you would expect something more like logical induction. I expect the answer to be somewhere in between.
I don’t personally imagine current LLMs are doing approximate logical induction (or approximate solomonoff) internally. I think of the base model as resembling a circuit prior updated on the data. The circuits that come out on top after the update also do some induction of their own internally, but it is harder to think about what form of inductive bias they have exactly (it would seem like a coincidence if it also happened to be well-modeled as a circuit prior, but, it must be something highly computationally limited like that, as opposed to Solomonoff-like).
I hesitate to call this a mesa-optimizer. Although good epistemics involves agency in principle (especially time-bounded epistemics), I think we can sensibly differentiate between mesa-optimizers and mere mesa-induction. But perhaps you intended this stronger reading, in support of your argument. If so, I’m not sure why you believe this. (No, I don’t find “planning ahead” results to be convincing—I feel this can still be purely epistemic in a relevant sense.)
Perhaps it suffices for your purposes to observe that good epistemics involves agency in principle?
Anyway, cutting more directly to the point:
I think you lack imagination when you say
[...] which can realistically compete with modern LLMs would ultimately look a lot like a semi-theoretically-justified modification to the loss function or optimizer of agentic fine-tuning / RL or possibly its scaffolding [...]
I think there are neural architectures close to the current paradigm which don’t directly train whole chains-of-thought on a reinforcement signal to achieve agenticness. This paradigm is analogous to model-free reinforcement learning. What I would suggest is more analogous to model-based reinforcement learning, with corresponding benefits to transparency. (Super speculative, of course.)
EDIT: I think that I miscommunicated a bit initially and suggest reading my response to Vanessa before this comment for necessary context.
I hesitate to call this a mesa-optimizer. Although good epistemics involves agency in principle (especially time-bounded epistemics), I think we can sensibly differentiate between mesa-optimizers and mere mesa-induction. But perhaps you intended this stronger reading, in support of your argument. If so, I’m not sure why you believe this. (No, I don’t find “planning ahead” results to be convincing—I feel this can still be purely epistemic in a relevant sense.)
I am fine with using the term mesa-induction. I think induction is a restricted type of optimization, but I suppose you associate the term mesa-optimizer with agency, and that is not my intended message.
I think there are neural architectures close to the current paradigm which don’t directly train whole chains-of-thought on a reinforcement signal to achieve agenticness. This paradigm is analogous to model-free reinforcement learning. What I would suggest is more analogous to model-based reinforcement learning, with corresponding benefits to transparency. (Super speculative, of course.)
I don’t think the chain of thought is necessary, but routing through pure sequence prediction in some fashion seems important for the current paradigm (that is what I call scaffolding). I expect that it is possible in principle to avoid this and do straight model-based RL, but forcing that approach to quickly catch up with LLMs / foundation models seems very hard and not necessarily desirable. In fact by default this seems bad for transparency, but perhaps some IB-inspired architecture is more transparent.
I don’t personally imagine current LLMs are doing approximate logical induction (or approximate solomonoff) internally. I think of the base model as resembling a circuit prior updated on the data. The circuits that come out on top after the update also do some induction of their own internally, but it is harder to think about what form of inductive bias they have exactly (it would seem like a coincidence if it also happened to be well-modeled as a circuit prior, but, it must be something highly computationally limited like that, as opposed to Solomonoff-like).
I hesitate to call this a mesa-optimizer. Although good epistemics involves agency in principle (especially time-bounded epistemics), I think we can sensibly differentiate between mesa-optimizers and mere mesa-induction. But perhaps you intended this stronger reading, in support of your argument. If so, I’m not sure why you believe this. (No, I don’t find “planning ahead” results to be convincing—I feel this can still be purely epistemic in a relevant sense.)
Perhaps it suffices for your purposes to observe that good epistemics involves agency in principle?
Anyway, cutting more directly to the point:
I think you lack imagination when you say
I think there are neural architectures close to the current paradigm which don’t directly train whole chains-of-thought on a reinforcement signal to achieve agenticness. This paradigm is analogous to model-free reinforcement learning. What I would suggest is more analogous to model-based reinforcement learning, with corresponding benefits to transparency. (Super speculative, of course.)
EDIT: I think that I miscommunicated a bit initially and suggest reading my response to Vanessa before this comment for necessary context.
I am fine with using the term mesa-induction. I think induction is a restricted type of optimization, but I suppose you associate the term mesa-optimizer with agency, and that is not my intended message.
I don’t think the chain of thought is necessary, but routing through pure sequence prediction in some fashion seems important for the current paradigm (that is what I call scaffolding). I expect that it is possible in principle to avoid this and do straight model-based RL, but forcing that approach to quickly catch up with LLMs / foundation models seems very hard and not necessarily desirable. In fact by default this seems bad for transparency, but perhaps some IB-inspired architecture is more transparent.