I think a lot of human “alignment” isn’t encoded in our brains, it’s encoded only interpersonally, in the fact that we need to negotiate with other humans of similar power. Once a human gets a lot of power, often the brakes come off. To the extent that’s true, alignment inspired by typical human architecture won’t work well for a stronger-than-human AI, and some other approach is needed.
I didn’t mean to suggest that any future approach has to rely on ‘typical human architecture’. I also believe the least possibly aligned humans are less aligned than the least possibly aligned dolphins, elephants, whales, etc…, are with each other. Treating AGI as a new species, at least as distant to us as dolphins for example, would be a good starting point.
I think a lot of human “alignment” isn’t encoded in our brains, it’s encoded only interpersonally, in the fact that we need to negotiate with other humans of similar power. Once a human gets a lot of power, often the brakes come off. To the extent that’s true, alignment inspired by typical human architecture won’t work well for a stronger-than-human AI, and some other approach is needed.
I didn’t mean to suggest that any future approach has to rely on ‘typical human architecture’. I also believe the least possibly aligned humans are less aligned than the least possibly aligned dolphins, elephants, whales, etc…, are with each other. Treating AGI as a new species, at least as distant to us as dolphins for example, would be a good starting point.