Q Home comments on Clarifying the Agent-Like Structure Problem

Q Home 4 Jun 2025 4:42 UTC
1 point
0
Then you can use the three dot points in my comment to construct source code for a new agent that does the same thing, but is not nicely separated.
This is the step I don’t get (how we make the construction), because I don’t understand SGD well. What does “sample N world models” mean?
My attempt to understand: We have a space of world models ( $S_{m}$ ) and a space of plans ( $S_{p}$ ). We pick points from $S_{p}$ (using SGD) and evaluate them on the best points of $S_{m}$ (we got those best points by trying to predict the world and applying SGD).
My thoughts/questions: To find the best points of $S_{m}$ , we still need to do modelling independently from planning? While the world model is not stored in memory, some pointer to the best points of $S_{m}$ is stored? We at least have “the best current plan” stored independently from the world models?