Alex Flint comments on Clarifying the Agent-Like Structure Problem

Alex Flint 1 Oct 2022 18:57 UTC
LW: 12 AF: 8
4
AF
OK suppose you have an agent that does perform remote control by explicitly forming a world model and then explicitly running a search process. I claim that if the world modelling process and the planning process are both some kind of gradient descent, then I can construct an agent that does the same remote control without explicitly forming a world model (though it will in general require more computation)

Start with some initial plan. Repeat:
- Sample N world models
- Evaluate the modelling objective for each one
- Perform a gradient step on the plan, for each of the N world models, weighted according to the modelling objective
This algorithm never really stores a “current best” world model nor a distribution of possible world models, but by SGD convergence proofs it should converge to the same thing as if you computed an explicit probability distribution over world models at the outset, using the modelling objective as a probability measure, and then performed planning search with respect to that.

If we were given only machine code for the algorithm I wrote above, it might be very difficult to disentangle what is “world model” and what is “planning”. You would certainly never find, stored anywhere in computer memory, a world model representing the agent’s overall beliefs given available data.

I suspect that I could further entangle the world modelling and planning parts of the algorithm above, to the point that it would be very difficult to objectively say that the algorithm is “really” forming a world model.

However, what would be very interesting would be to show that the most computationally efficient version of the above necessarily disentangles world modelling from planning.
- Q Home 22 Apr 2025 10:59 UTC
  1 point
  0
  Parent
  Can anybody give/reference an ELI5 or ELI15 explanation of this example? How can we use the models without creating them? I know that gradient descent is used to update neural networks, but how can you get the predictions of those NNs without having them?
  - Alex Flint 3 Jun 2025 15:33 UTC
    5 points
    0
    Parent
    The idea here is that world modelling (working out what the state of the world at the present moment is) and planning (working out what to do given the state of the world at the present moment) might be very tangled up with each other in the source code for some AI agents.
    
    When we think of building agents that act in the world, it’s common to imagine that they will first use the information available to them to create a model of the world, and then, given that, formulate a plan to achieve some kind of goal. That’s one possible way to build agents, but John’s post here actually attempts to say something about the space of all possible agents. While some agents may have nicely separated modelling/planning algorithms, it’s not guaranteed that it will be that way at all, and the point of my comment here was to show that for any nicely separated agent, there is a not-nicely-separated agent that arrives at the same plan in the limit of time.
    
    My argument here goes as follows: suppose that you have some agent that is nicely separated into world modelling and planning submodules/sub-algorithms. Then you can use the three dot points in my comment to construct source code for a new agent that does the same thing, but is not nicely separated. The point of this is to show that it cannot be that the best or most optimal agents are nicely separated, because for every nicely separated agent source code, there is an equally-good not-nice-separated agent source code.
    - Q Home 4 Jun 2025 4:42 UTC
      1 point
      0
      Parent
      Then you can use the three dot points in my comment to construct source code for a new agent that does the same thing, but is not nicely separated.
      This is the step I don’t get (how we make the construction), because I don’t understand SGD well. What does “sample N world models” mean?
      My attempt to understand: We have a space of world models ( $S_{m}$ ) and a space of plans ( $S_{p}$ ). We pick points from $S_{p}$ (using SGD) and evaluate them on the best points of $S_{m}$ (we got those best points by trying to predict the world and applying SGD).
      My thoughts/questions: To find the best points of $S_{m}$ , we still need to do modelling independently from planning? While the world model is not stored in memory, some pointer to the best points of $S_{m}$ is stored? We at least have “the best current plan” stored independently from the world models?