abramdemski comments on Matt Botvinick on the spontaneous emergence of learning algorithms

abramdemski 18 Aug 2020 14:54 UTC
LW: 9 AF: 5
AF
I think it makes more sense to operationalize “catastrophic” here as “leading to systematically low DA reward”, perhaps also including “manipulating the DA system in a clearly misaligned way”.
One way catastrophic alignment in this sense is difficult for humans is that the PFC cannot divorce itself from the DA; I’d expect that a failure mode leading to systematically low DA rewards would usually be corrected gradually, as the DA punishes those patterns.
However, this is not really clear. The misaligned PFC might e.g. put itself in a local maximum, where it creates DA punishment for giving into temptation. (For example, an ascetic getting social reinforcement from a group of ascetics might be in such a situation.)
- Adam Scholl 18 Aug 2020 16:11 UTC
  1 point
  Parent
  I think it makes more sense to operationalize “catastrophic” here as “leading to systematically low DA reward
  Thanks—I do think this operationalization makes more sense than the one I proposed.
  - Adam Scholl 18 Aug 2020 17:42 UTC
    3 points
    Parent
    One way catastrophic alignment in this sense is difficult for humans is that the PFC cannot divorce itself from the DA; I’d expect that a failure mode leading to systematically low DA rewards would usually be corrected
    I’m not sure divorce like this is rare. For example, anorexia sometimes causes people to find food anti-rewarding (repulsive/inedible, even when they’re dying and don’t to be), and I can imagine that being because PFC actually somehow alters DAs reward function.
    But I do share the hunch that something like a “divorce resistance” trick occurs and is helpful. I took Kaj and Steve to be gesturing at something similar elsewhere in the thread. But I notice feeling confused about how exactly this trick might work. Does it scale...?
    I have the intuition that it doesn’t—that as the systems increase in power, divorce occurs more easily. That is, I have the intuition that if PFC were trying, so to speak, to divorce itself from DA supervision, that it could probably find some easy-ish way to succeed, e.g. by reconfiguring itself to hide activity from DA, or to send reward-eliciting signals to DA regardless of what goal it was pursuing.