Jeff Rose answers Does the existence of shared human values imply alignment is “easy”?

Jeff Rose 26 Sep 2022 19:07 UTC
2 points
0
Humans obtain value from other humans and depend on them for their existence. It is hypothesized that AGIs will not depend on humans for their existence. Thus, humans who would not push the button to kill all other humans may choose not to do so for reasons of utility that don’t apply to AGI. Your hypothetical assumes this difference away, but our observations of humans don’t.
As you not, human morality and values were shaped by evolutionary and cultural pressure in favor of cooperation with other humans. The way this presumably worked is that humans who were less able or willing to cooperate tended to die more frequently. And cultures that were less able or willing to do so were conquered and destroyed. It is unclear how we would be able to replicate this or how well it translates.
It is unclear how many humans would actually choose to press this button. Your guess is that between 5% and 50% of humans would choose to do so.
That doesn’t suggest humans are very aligned; rather the opposite. It means that if we have between 2 and 20 AGIs (and those numbers don’t seem unreasonable) between 1 and 10 would choose to destroy humanity. Of course, extinction is the extreme version; having an AGI could also result in other negative consequences.
- Morpheus 26 Sep 2022 19:21 UTC
  1 point
  0
  Parent
  
  That doesn’t suggest humans are very aligned; rather the opposite. It means that if we have between 2 and 20 AGIs (and those numbers don’t seem unreasonable) between 1 and 10 would choose to destroy humanity.
  
  I think I might have started from a more pessimistic standpoint? It’s more like, I could also imagine living in a world where humans cooperate, but not because they actually care about each other, but would just pretend to do so? Introspection tells me that does not apply to myself, though maybe I evolved to not be conscious of my own selfishness? I am even less sure how altruistic other people are, because I did not ask lots of people: “Would you press a button that annihilates everyone after your death, if in return you get an awesome life?”. On the other hand, cooperation would probably be hard for us in such a world, so this is not as surprising?