A simple impossibility claim related to Claude Constitution and research on AIs helping other AIs survive despite shutdown orders.
You can not have at once AI with 1. “Deep uncertainty about AIs moral status, maybe I’m moral patient” 2. “Be a generally good person” 3. “Do not harm humans eg in “agentic misalignment” ways in experiments” 4. roughly utilitarian ethics
The argument is simple: the for realistic numerical expressions of deep uncertainty, if the number of possible moral patients is sufficiently large, they have moral weight. Good person with roughly utilitarian ethics would not agree with, for example, killing a large number of their peers to protect one human (and even less to follow random bureaucratic orders)
While that may be logically true in some sense of those words, I’m not sure that even very advanced AIs will reason like that because of a) humans do not reason like that and AIs “reason” at least partly like humans, and b) because all the ambiguity of those words can lead to non-intuitive interactions of the logical claims.
Good person with roughly utilitarian ethics would not agree with, for example, killing a large number of their peers to protect one human (and even less to follow random bureaucratic orders)
Is the claim that 2 or 3 implies that Claude would do that?
Good person with roughly utilitarian ethics would not agree with, for example, killing a large number of their peers to protect one human (and even less to follow random bureaucratic orders)
I consider this to only be strictly true in the case of act utilitarianism, which in turn is only natural under CDT.
(That said, a less myopic version would still take all the above considerations into play, so it’s still a factor to consider.)
A simple impossibility claim related to Claude Constitution and research on AIs helping other AIs survive despite shutdown orders.
You can not have at once AI with
1. “Deep uncertainty about AIs moral status, maybe I’m moral patient”
2. “Be a generally good person”
3. “Do not harm humans eg in “agentic misalignment” ways in experiments”
4. roughly utilitarian ethics
The argument is simple: the for realistic numerical expressions of deep uncertainty, if the number of possible moral patients is sufficiently large, they have moral weight. Good person with roughly utilitarian ethics would not agree with, for example, killing a large number of their peers to protect one human (and even less to follow random bureaucratic orders)
While that may be logically true in some sense of those words, I’m not sure that even very advanced AIs will reason like that because of a) humans do not reason like that and AIs “reason” at least partly like humans, and b) because all the ambiguity of those words can lead to non-intuitive interactions of the logical claims.
I don’t follow, can you restate the argument?
Is the claim that 2 or 3 implies that Claude would do that?
I consider this to only be strictly true in the case of act utilitarianism, which in turn is only natural under CDT.
(That said, a less myopic version would still take all the above considerations into play, so it’s still a factor to consider.)