Zac Hatfield-Dodds comments on Anthropic’s Core Views on AI Safety

Zac Hatfield-Dodds 12 Mar 2023 17:21 UTC
6 points
5
1. “AI is more dangerous the more different it is from us” seems wrong to me: it is very different and likely to be very dangerous, but that doesn’t imply that making it somewhat more like us would make it less dangerous. I don’t think brain emulation can be developed in time, replaying evolution seems unhelpful to me, and both seem likely to cause enormous suffering (aka mindcrime).
2. See my colleague Ethan Perez’s comment here on upcoming research, including studying situational awareness as a risk factor for deceptive misalignment.
- RussellThor 16 Mar 2023 5:35 UTC
  1 point
  0
  Parent
  Thanks. OK I will put some more general thoughts, have to go back a few steps.
  To me the more general alignment problem is AI gives humanity ~10,000 years of progress and probably irreversible change in ~1-10 years. To me the issue is how do you raise humans intelligence from that given by biology to that given by the limits of physics in a way that is identify preserving as much as possible. Building AI seems to be the worst way to do that. If I had a fantasy way it would be say increase everyone’s IQ by 10 points per year for 100+ years until we reach the limit.
  We can’t do that but that is why I mentioned WBE, my desire would be to stop AGI, get human mind uploading to work, then let those WBE raise their IQ in parallel. Their agreed upon values would be humanities values by definition then.
  If our goal is Coherent Extrapolated Volition or something similar for humanity then how can we achieve that if we don’t increase the IQ of humans (or descendants they identify with)? How can we even know what our own desires/values are at increasing IQ’s if we don’t directly experience them.
  I have an opinion what successful alignment looks like to me but is it very different for other people? We can all agree what bad is.