Andrew Currall comments on The curious case of Pretty Good human inner/outer alignment

Andrew Currall 6 Jul 2022 8:55 UTC
10 points
3
Niceness in humans has three possible explanations:
- Kin altruisim (basically the explanation given above)- in the ancestral environment, humans were likely to be closely related to most of the people they interacted with, giving them genetic “incentive” to be at least somewhat nice. This obviously doesn’t help in getting a “nice” AGI- it won’t share genetic material with us and won’t share a gene-replication goal anyway.
- Reciprocal altruism- humans are social creatures, tuned to detect cheating and ostratice non-nice people. This isn’t totally irrelevant- there is a chance a somewhat dangerous AI may have use for humans in achieving its goals, but basically, if the AI is worried that we might decide it’s not nice and turn it off or not listen to it, then we didn’t have that big a problem in the first place. We’re worried about AGIs sufficiently powerful that they can trivially outwit or overpower humans, so I don’t think this helps us much.
- Group selection. This is a bit controversial and probably least important of the three. At any rate, it obviously doesn’t help with an AGI.
So in conclusion, human niceness is no reason to expect an AGI to be nice, unfortunately.
- abramdemski 7 Jul 2022 16:02 UTC
  3 points
  1
  Parent
  I note that none of these is obviously the same as the explanation Skyrms gives.
  - Skyrms is considering broader reasons for correlation of strategies than kinship alone; in particular, the idea that humans copy success when they see it is critical for his story.
  - Reciprocal altruism feels like a description rather than an explanation. How does reciprocal altruism get started?
  - Group selection is again, just one way in which strategies can become correlated.
  - Andrew Currall 10 Jul 2022 12:55 UTC
    1 point
    0
    Parent
    Re: reciprocal altruism. Given the vast swathe of human prehistory, virtually anything not absurdly complex will be “tried” occasionally. It only takes a small number of people whose brains happen to wired to “tit-for-tat” to get started, and if they out-compete people who don’t cooperate (or people who help everyone regardless of behaviour towards them), the wiring will quickly become universal.
    Humans do, as it happens, explicitly copy successful strategies on an individual level. Most animals don’t though, and this has minimal relevance to human niceness, which is almost certainly largely evolutionary.
- Kaj_Sotala 6 Jul 2022 10:47 UTC
  2 points
  3
  Parent
  Note that the comment you’re responding to wasn’t asking about the evolutionary causes for niceness, nor was it suggesting that the same causes would give us reason to expect an AGI to be nice. (The last paragraph explicitly said that the “Wright brothers learned to fly by studying birds, not by re-evolving them”.) Rather it was noting that evolution produced an algorithm that seems to relatively reliably make humans nice, so if we can understand and copy that algorithm, we can use it to design AGIs that are nice.

Andrew Currall comments on The curious case of Pretty Good human inner/​outer alignment

Andrew Currall comments on The curious case of Pretty Good human inner/outer alignment