gwern comments on [Intro to brain-like-AGI safety] 6. Big picture of motivation, decision-making, and RL

gwern 12 Apr 2022 20:34 UTC
LW: 9 AF: 3
AF
An analogy that comes to mind is sociopathy. Closely linked to fear/reward insensitivity and impulsivity. Something you see a lot in case studies of diagnosed or accounts of people who look obviously like sociopaths is that they will be going along just fine, very competent and intelligent seeming, getting away with everything, until they suddenly do something which is just reckless, pointless, useless and no sane person could possibly think they’d get away with it. Why did they do X, which caused the whole house of cards to come tumbling down and is why you are now reading this book or longform investigative piece about them? No reason. They just sorta felt like it. The impulse just came to them. Like jumping off a bridge.
- Steven Byrnes 13 Apr 2022 17:31 UTC
  LW: 2 AF: 2
  AF Parent
  Huh. I would have invoked a different disorder.
  I think that if we replace the Thought Assessor & Steering Subsystem with the function “RPE = +∞ (regardless of what’s going on)”, the result is a manic episode, and if we replace it with the function “RPE = -∞ (regardless of what’s going on)”, the result is a depressive episode.
  In other words, the manic episode would be kinda like the brainstem saying “Whatever thought you’re thinking right now is a great thought! Whatever you’re planning is an awesome plan! Go forth and carry that plan out with gusto!!!!” And the depressive episode would be kinda like the brainstem saying “Whatever thought you’re thinking right now is a terrible thought. Stop thinking that thought! Think about anything else! Heck, think about nothing whatsoever! Please, anything but that thought!”
  My thoughts about sociopathy are here. Sociopaths can be impulsive (like everyone), but it doesn’t strike me as a central characteristic, as it is in mania. I think there might sometimes be situations where a sociopath does X, and onlookers characterize it as impulsive, but in fact it’s just what the sociopath wanted to do, all things considered, stemming from different preferences / different reward function. For example, my impression is that sociopaths get very bored very easily, and will do something that seems crazy and inexplicable from a neurotypical perspective, but seems a good way to alleviate boredom from their own perspective.
  (Epistemic status: Very much not an expert on mania or depression, I’ve just read a couple papers. I’ve read a larger number of books and papers on sociopathy / psychopathy (which think are synonyms?), plus there were two sociopaths in my life that I got to know reasonably well, unfortunately. More of my comments about depression here.)
- Rafael Harth 12 Apr 2022 21:16 UTC
  LW: 2 AF: 1
  AF Parent
  Do you think this describes language models?
  - gwern 12 Apr 2022 21:17 UTC
    LW: 7 AF: 2
    AF Parent
    ‘Insensitivity to reward or punishment’ does sound relevant...
    - Rafael Harth 12 Apr 2022 22:06 UTC
      LW: 2 AF: 1
      AF Parent
      Yes, but I didn’t mean to ask whether it’s relevant, I meant to ask whether it’s accurate. Does the output of language models, in fact, feel like this? Seemed like something relevant to ask you since you’ve seen lots of text completions.
      
      And if it does, what is the reason for not having long timelines? If neural networks only solved the easy part of the problem, that implies that they’re a much smaller step toward AGI than many argued recently.
      - gwern 13 Apr 2022 0:38 UTC
        LW: 7 AF: 2
        AF Parent
        I said it was an analogy. You were discussing what intelligent human-level entities with inhibition control problems would hypothetically look like; well, as it happens, we do have such entities, in the form of sociopaths, and as it happens, they do not simply explode in every direction due to lacking inhibitions but often perform at high levels manipulating other humans until suddenly then they explode. This is proof of concept that you can naturally get such streaky performance without any kind of exotic setup or design. Seems relevant to mention.