OK suppose you have an agent that does perform remote control by explicitly forming a world model and then explicitly running a search process. I claim that if the world modelling process and the planning process are both some kind of gradient descent, then I can construct an agent that does the same remote control without explicitly forming a world model (though it will in general require more computation)
Start with some initial plan. Repeat:
Sample N world models
Evaluate the modelling objective for each one
Perform a gradient step on the plan, for each of the N world models, weighted according to the modelling objective
This algorithm never really stores a “current best” world model nor a distribution of possible world models, but by SGD convergence proofs it should converge to the same thing as if you computed an explicit probability distribution over world models at the outset, using the modelling objective as a probability measure, and then performed planning search with respect to that.
If we were given only machine code for the algorithm I wrote above, it might be very difficult to disentangle what is “world model” and what is “planning”. You would certainly never find, stored anywhere in computer memory, a world model representing the agent’s overall beliefs given available data.
I suspect that I could further entangle the world modelling and planning parts of the algorithm above, to the point that it would be very difficult to objectively say that the algorithm is “really” forming a world model.
However, what would be very interesting would be to show that the most computationally efficient version of the above necessarily disentangles world modelling from planning.
OK suppose you have an agent that does perform remote control by explicitly forming a world model and then explicitly running a search process. I claim that if the world modelling process and the planning process are both some kind of gradient descent, then I can construct an agent that does the same remote control without explicitly forming a world model (though it will in general require more computation)
Start with some initial plan. Repeat:
Sample N world models
Evaluate the modelling objective for each one
Perform a gradient step on the plan, for each of the N world models, weighted according to the modelling objective
This algorithm never really stores a “current best” world model nor a distribution of possible world models, but by SGD convergence proofs it should converge to the same thing as if you computed an explicit probability distribution over world models at the outset, using the modelling objective as a probability measure, and then performed planning search with respect to that.
If we were given only machine code for the algorithm I wrote above, it might be very difficult to disentangle what is “world model” and what is “planning”. You would certainly never find, stored anywhere in computer memory, a world model representing the agent’s overall beliefs given available data.
I suspect that I could further entangle the world modelling and planning parts of the algorithm above, to the point that it would be very difficult to objectively say that the algorithm is “really” forming a world model.
However, what would be very interesting would be to show that the most computationally efficient version of the above necessarily disentangles world modelling from planning.