Fun post! Totally disagree that human values aren’t largely arbitrary. Even before you get into AIs that might have orders of magnitude different of the determining stuff, I think evolution just could have solved the problem of “what are some good innate drives that get humans to make more humans” multiple ways.
Obviously not while them still being humans. But they could be tool-using omnivores with social insticts as different from ours as a crab leg is from a mouse leg.
These are the drives listed in the “Universal Drives” section:
Affection, friendship, love
Play, curiosity
Anger, envy
Each of them has utility in the singleplayer and multiplayer games we play in our lives. There are degrees of freedom in how they’re implemented, but they stabilize cooperation which has value. I don’t think the word arbitrary is specific enough to be a crux here but I agree OP seems to be imagining too much convergence. Potāto potăto.
For each drive above we can ask “does an AI need a recognizable version of that value to saturate the environments we’re likely to get soon”. I think the answer is pretty much no for each?
We have an overhang currently, where humans have some deontological-ish tendencies to cooperate even where it’s not locally optimal. We’re exploitable. This works well when we’re the only players in the game, but collapses when flexible, selfish, fast replicators are introduced. I was surprised to see “Integrating humans” as the final section of the talk. I think we’re dead in these worlds, and all of the interesting cooperation happens after the bad players are outcompeted.
Also, hi Charlie! We met a few months ago. I am the guy who asked “how’s that going?” when you mentioned you’re working on value learning, and you suggested reading your goodhart series. I’ve been reading your stuff.
I think it’s a spectrum. Affection might range in specificity from “there are peers that are associated with specific good things happening (e.g. a specific food),” to “I seek out some peers’ company using specific sorts of social rituals, I feel better when they’re around using emotions that interact in specific ways with memory, motivation, and attention, I perform some specialized signalling behavior (e.g. grooming) towards them and am instinctively sensitive to their signalling in return, I cooperate with them and try to further their interests, but mostly within limited domains that match my cultural norm of friendship, etc.”
Fun post! Totally disagree that human values aren’t largely arbitrary. Even before you get into AIs that might have orders of magnitude different of the determining stuff, I think evolution just could have solved the problem of “what are some good innate drives that get humans to make more humans” multiple ways.
Obviously not while them still being humans. But they could be tool-using omnivores with social insticts as different from ours as a crab leg is from a mouse leg.
These are the drives listed in the “Universal Drives” section:
Affection, friendship, love
Play, curiosity
Anger, envy
Each of them has utility in the singleplayer and multiplayer games we play in our lives. There are degrees of freedom in how they’re implemented, but they stabilize cooperation which has value. I don’t think the word arbitrary is specific enough to be a crux here but I agree OP seems to be imagining too much convergence. Potāto potăto.
For each drive above we can ask “does an AI need a recognizable version of that value to saturate the environments we’re likely to get soon”. I think the answer is pretty much no for each?
We have an overhang currently, where humans have some deontological-ish tendencies to cooperate even where it’s not locally optimal. We’re exploitable. This works well when we’re the only players in the game, but collapses when flexible, selfish, fast replicators are introduced. I was surprised to see “Integrating humans” as the final section of the talk. I think we’re dead in these worlds, and all of the interesting cooperation happens after the bad players are outcompeted.
Also, hi Charlie! We met a few months ago. I am the guy who asked “how’s that going?” when you mentioned you’re working on value learning, and you suggested reading your goodhart series. I’ve been reading your stuff.
I think it’s a spectrum. Affection might range in specificity from “there are peers that are associated with specific good things happening (e.g. a specific food),” to “I seek out some peers’ company using specific sorts of social rituals, I feel better when they’re around using emotions that interact in specific ways with memory, motivation, and attention, I perform some specialized signalling behavior (e.g. grooming) towards them and am instinctively sensitive to their signalling in return, I cooperate with them and try to further their interests, but mostly within limited domains that match my cultural norm of friendship, etc.”