Adam Scholl comments on Matt Botvinick on the spontaneous emergence of learning algorithms

Adam Scholl 18 Aug 2020 16:11 UTC
1 point
I think it makes more sense to operationalize “catastrophic” here as “leading to systematically low DA reward
Thanks—I do think this operationalization makes more sense than the one I proposed.
- Adam Scholl 18 Aug 2020 17:42 UTC
  3 points
  Parent
  One way catastrophic alignment in this sense is difficult for humans is that the PFC cannot divorce itself from the DA; I’d expect that a failure mode leading to systematically low DA rewards would usually be corrected
  I’m not sure divorce like this is rare. For example, anorexia sometimes causes people to find food anti-rewarding (repulsive/inedible, even when they’re dying and don’t to be), and I can imagine that being because PFC actually somehow alters DAs reward function.
  But I do share the hunch that something like a “divorce resistance” trick occurs and is helpful. I took Kaj and Steve to be gesturing at something similar elsewhere in the thread. But I notice feeling confused about how exactly this trick might work. Does it scale...?
  I have the intuition that it doesn’t—that as the systems increase in power, divorce occurs more easily. That is, I have the intuition that if PFC were trying, so to speak, to divorce itself from DA supervision, that it could probably find some easy-ish way to succeed, e.g. by reconfiguring itself to hide activity from DA, or to send reward-eliciting signals to DA regardless of what goal it was pursuing.