My general hypothesis on this front is that the brain’s planning modules are doing something like RL as inference, but that they’re sometimes a bit sloppy about properly labelling which things are and are under their own control.
To elaborate: in RL as inference, you consider a “prior” over some number of input → action → outcome loops, and then perform a Bayesian-ish update towards outcomes which get high reward. But you have to constrain your update to only change P(action | input) values, while keeping the P(outcome | action) and P(input | outcome) values the same. In this case, the brain is sloppy about labelling and labels P(tired | stay up) as something it can influence.
This might happen because of some consistency mechanism which tries to mediate between different predictors. Perhaps if it gets one system saying “We will keep playing video games” and another saying “We mustn’t be tired tomorrow” then the most reasonable update is that P(tired | stay up) is, in fact, influencable.
Epistemic status: kinda vibesy
My general hypothesis on this front is that the brain’s planning modules are doing something like RL as inference, but that they’re sometimes a bit sloppy about properly labelling which things are and are under their own control.
To elaborate: in RL as inference, you consider a “prior” over some number of input → action → outcome loops, and then perform a Bayesian-ish update towards outcomes which get high reward. But you have to constrain your update to only change P(action | input) values, while keeping the P(outcome | action) and P(input | outcome) values the same. In this case, the brain is sloppy about labelling and labels P(tired | stay up) as something it can influence.
This might happen because of some consistency mechanism which tries to mediate between different predictors. Perhaps if it gets one system saying “We will keep playing video games” and another saying “We mustn’t be tired tomorrow” then the most reasonable update is that P(tired | stay up) is, in fact, influencable.