Then you can use the three dot points in my comment to construct source code for a new agent that does the same thing, but is not nicely separated.
This is the step I don’t get (how we make the construction), because I don’t understand SGD well. What does “sample N world models” mean?
My attempt to understand: We have a space of world models (Sm) and a space of plans (Sp). We pick points from Sp (using SGD) and evaluate them on the best points of Sm (we got those best points by trying to predict the world and applying SGD).
My thoughts/questions: To find the best points of Sm, we still need to do modelling independently from planning? While the world model is not stored in memory, some pointer to the best points of Smis stored? We at least have “the best current plan” stored independently from the world models?
This is the step I don’t get (how we make the construction), because I don’t understand SGD well. What does “sample N world models” mean?
My attempt to understand: We have a space of world models (Sm) and a space of plans (Sp). We pick points from Sp (using SGD) and evaluate them on the best points of Sm (we got those best points by trying to predict the world and applying SGD).
My thoughts/questions: To find the best points of Sm, we still need to do modelling independently from planning? While the world model is not stored in memory, some pointer to the best points of Sm is stored? We at least have “the best current plan” stored independently from the world models?