I’m curious on how/if goal coherence over long term plans is explained by your “planning as reward shaping” model? If planning amounts to an escalation of more and more “real thoughts” (i.e. I’m idly thinking about prinsesstårta → A fork-full of prinsesstårta is heading towards my mouth), because these correspond to stronger activations in a valenced latent in my world model, and my thought generator is biased towards producing higher valence thoughts, it’s unclear to me why we wouldn’t just default to the production of untopical thoughts (i.e. I’m idly thinking about prinsesstårta → I’m thinking about underneath a weighted blanket) and never get anything done in the world.
One reply would be to bite the bullet and say yup, humans due in fact have deficits in their long term planning strategies and this accounts for them but this feels unsatisfying; if the story given in my comment above was the only mechanism I’d expect us to be much worse. One possible reply is that “non-real thoughts” don’t reliably lead towards rewards from the steering subsystem and the thought assessors down weight the valence associated w/ these thoughts thus leading to them being generated w/ lower frequency; consequently then, the only thought sequences which remain are ones which terminate in “real thoughts” and stimulate accurate predictions of the steering subsystem. This seems plausibly sufficient, but it still doesn’t answer the question of why people don’t arbitrarily switch into “equally real but non-topical” thought sequences at higher frequencies.
Thanks! I’m not 100% sure what you’re getting at, here are some possible comparisons:
“idle musing about eating prinsesstårta sometime in the future” VERSUS “plan to eat the prinsesstårta on my plate right now” → latter is preferred
“idle musing about eating prinsesstårta sometime in the future” VERSUS “idle musing about snuggling under a weighted blanket sometime in the future” → either might be preferred, depending on which has higher valence, which in turn depends on whether I’m hungry or tired etc.
“idle musing about eating prinsesstårta sometime in the future” VERSUS “plan to snuggle under a weighted blanket right now” → again, either might be preferable, but compared to the previous bullet point, the latter is likelier to win, because it’s extra-appealing from its immediacy.
I think this is consistent with experience, right?
But maybe you’re instead talking about this comparison:
“idle musing about eating prinsesstårta sometime in the future” VERSUS “thinking about the fact that I am right now under a cozy weighted blanket” …
I think the latter thought here doesn’t have much positive valence. I think, when we say we “enjoy” being under a weighted blanket, the pleasure signal is more like “transient pleasure upon starting to be under the blanket, and transient displeasure upon stopping, but not really continuous pleasure during the process, or at least not so much pleasure that we just dwell on that feeling; instead, our mind starts wandering elsewhere (partly due to boredom).” Not many experiences are so pleasurable that we’re really meditating on it for an extended period, at least not without deliberate effort towards mindfulness. Right?
Or if I’m still misunderstanding, can you try again?
I’m curious on how/if goal coherence over long term plans is explained by your “planning as reward shaping” model? If planning amounts to an escalation of more and more “real thoughts” (i.e. I’m idly thinking about prinsesstårta → A fork-full of prinsesstårta is heading towards my mouth), because these correspond to stronger activations in a valenced latent in my world model, and my thought generator is biased towards producing higher valence thoughts, it’s unclear to me why we wouldn’t just default to the production of untopical thoughts (i.e. I’m idly thinking about prinsesstårta → I’m thinking about underneath a weighted blanket) and never get anything done in the world.
One reply would be to bite the bullet and say yup, humans due in fact have deficits in their long term planning strategies and this accounts for them but this feels unsatisfying; if the story given in my comment above was the only mechanism I’d expect us to be much worse. One possible reply is that “non-real thoughts” don’t reliably lead towards rewards from the steering subsystem and the thought assessors down weight the valence associated w/ these thoughts thus leading to them being generated w/ lower frequency; consequently then, the only thought sequences which remain are ones which terminate in “real thoughts” and stimulate accurate predictions of the steering subsystem. This seems plausibly sufficient, but it still doesn’t answer the question of why people don’t arbitrarily switch into “equally real but non-topical” thought sequences at higher frequencies.
Thanks! I’m not 100% sure what you’re getting at, here are some possible comparisons:
“idle musing about eating prinsesstårta sometime in the future” VERSUS “plan to eat the prinsesstårta on my plate right now” → latter is preferred
“idle musing about eating prinsesstårta sometime in the future” VERSUS “idle musing about snuggling under a weighted blanket sometime in the future” → either might be preferred, depending on which has higher valence, which in turn depends on whether I’m hungry or tired etc.
“idle musing about eating prinsesstårta sometime in the future” VERSUS “plan to snuggle under a weighted blanket right now” → again, either might be preferable, but compared to the previous bullet point, the latter is likelier to win, because it’s extra-appealing from its immediacy.
I think this is consistent with experience, right?
But maybe you’re instead talking about this comparison:
“idle musing about eating prinsesstårta sometime in the future” VERSUS “thinking about the fact that I am right now under a cozy weighted blanket” …
I think the latter thought here doesn’t have much positive valence. I think, when we say we “enjoy” being under a weighted blanket, the pleasure signal is more like “transient pleasure upon starting to be under the blanket, and transient displeasure upon stopping, but not really continuous pleasure during the process, or at least not so much pleasure that we just dwell on that feeling; instead, our mind starts wandering elsewhere (partly due to boredom).” Not many experiences are so pleasurable that we’re really meditating on it for an extended period, at least not without deliberate effort towards mindfulness. Right?
Or if I’m still misunderstanding, can you try again?