Steven Byrnes comments on [Intro to brain-like-AGI safety] 6. Big picture of motivation, decision-making, and RL

Steven Byrnes 4 Mar 2022 20:25 UTC
LW: 3 AF: 3
AF
Thanks!
Right, I think there’s one reward function (well, one reward function that’s relevant for this discussion), and that for every thought we think, we’re thinking it because it’s rewarding to do so—or at least, more rewarding than alternative thoughts. Sometimes a thought is rewarding because it involves feeling good now, sometimes it’s rewarding because it involves an expectation of feeling good in the distant future, sometimes it’s rewarding because it involves an expectation that it will make your beloved friend feel good, sometimes it’s rewarding because it involves an expectation that it will make your admired in-group members very impressed with you, etc.
I think that the thing that gets rewarded is thoughts / plans, not just actions / states. So we don’t have to assume that the Thought Generator is proposing an action that’s unrewarding now (going to the gym) in order to get into a more-rewarding state later on (being ripped). Instead, the Thought Generator can generate one thought right now, “I’m gonna go to the gym so that I can get ripped”. That one thought can be rewarding right now, because the “…so that I can get ripped” is right there in the thought, providing evidence to the brainstem that the thought should be rewarded, and that evidence can plausibly outweigh the countervailing evidence from the “I’m gonna go to the gym…” part of the thought.
I do think there’s still an adjustable parameter in the brain related to time-discounting, even if the details are kinda different than in normal RL. But I don’t see a strong connection between that and social instincts. For example, if you abstain from ice cream to avoid a stomach ache, that’s a time-discounting thing, but it’s not a social-instincts thing. It’s possible that social animals in general are genetically wired to time-discount less than non-social animals, but I don’t have any particular reason to expect that to be the case. Or, maybe humans in particular are genetically wired to time-discount less than other animals, I don’t know, but if that’s true, I still wouldn’t expect that it has to do with humans being social; rather I would assume that it evolved because humans are smarter, and therefore human plans are unusually likely to work out as predicted, compared to other animals.
I think social instincts come from having things in the innate reward function that track “having high status in my in-group” and “being treated fairly” and “getting revenge” and so on. (To a first approximation.) Post #13(ish) will be a (hopefully) improved and updated version of this discussion of how such things might get actually incorporated into the reward function, given the difficulties related to symbol-grounding. You might also be interested in my post (Brainstem, Neocortex) ≠ (Base Motivations, Honorable Motivations).
Hope this helps, happy to talk more, either here or by phone if you think that would help. :)