From reading, I imagined a memory+cache structure instead of being closer to “cache all the way down”.
Note that the things being cached are not things stored in memory elsewhere. Rather, they’re (supposedly) outputs of costly-to-compute functions—e.g. the instrumental value of something would be costly to compute directly from our terminal goals and world model. And most of the values in cache are computed from other cached values, rather than “from scratch”—e.g. the instrumental value of X might be computed (and then cached) from the already-cached instrumental values of some stuff which X costs/provides.
Note that the things being cached are not things stored in memory elsewhere. Rather, they’re (supposedly) outputs of costly-to-compute functions—e.g. the instrumental value of something would be costly to compute directly from our terminal goals and world model. And most of the values in cache are computed from other cached values, rather than “from scratch”—e.g. the instrumental value of X might be computed (and then cached) from the already-cached instrumental values of some stuff which X costs/provides.
Coherence of Caches and Agents goes into more detail on that part of the picture, if you’re interested.