Aligning superhuman agents to human values is hard. Normally, when we do hard things, we try to do easier but similar things first to get a sense of what the hard thing would be like. As far as I know, the usual way people try to make that goal easier is to try to align subhuman agents to human values, in the hope that this alignment will scale up.
But what if instead we try to align subhuman agents to animal values? Presumably, they are simpler, and easier to align with. If we can make an AI that can reliably figure out and implement whatever it is a cat (for instance) wants, maybe the process of figuring out how to make that AI will give insights into making an AI for humans.
For instance: as far as I know, I am relatively well aligned to my cat’s values. I know when he wants me to turn on the faucet for him (he only drinks faucet water), and when he wants to play fetch (yes, my cat plays fetch), and when he wants to cuddle, etc. I successfully determine how satisfied he feels by my performance of these things and I have learned from scratch how to read his body language to discern if there’s ways I can do them better—for instance, he seems to get distracted if someone talks or even is in the same room as him while he’s drinking water, and he usually stops and looks to see what they are doing, so I let him have privacy and he gets done faster.
Can we make an AI that can figure out how to do all those things, and be innately motivated to?
For millennia, cats have made humans worship and pamper them. If this idea takes it one step further and leads to humans accidentally building an AI that fills the universe with happy cats, I have to say: well played, cats!
Aligning superhuman agents to human values is hard. Normally, when we do hard things, we try to do easier but similar things first to get a sense of what the hard thing would be like. As far as I know, the usual way people try to make that goal easier is to try to align subhuman agents to human values, in the hope that this alignment will scale up.
But what if instead we try to align subhuman agents to animal values? Presumably, they are simpler, and easier to align with. If we can make an AI that can reliably figure out and implement whatever it is a cat (for instance) wants, maybe the process of figuring out how to make that AI will give insights into making an AI for humans.
For instance: as far as I know, I am relatively well aligned to my cat’s values. I know when he wants me to turn on the faucet for him (he only drinks faucet water), and when he wants to play fetch (yes, my cat plays fetch), and when he wants to cuddle, etc. I successfully determine how satisfied he feels by my performance of these things and I have learned from scratch how to read his body language to discern if there’s ways I can do them better—for instance, he seems to get distracted if someone talks or even is in the same room as him while he’s drinking water, and he usually stops and looks to see what they are doing, so I let him have privacy and he gets done faster.
Can we make an AI that can figure out how to do all those things, and be innately motivated to?
For millennia, cats have made humans worship and pamper them. If this idea takes it one step further and leads to humans accidentally building an AI that fills the universe with happy cats, I have to say: well played, cats!