Thank you for your comment! I think my solution is applicable to arbitrary intelligent AI for the following reasons: 1. During the development stage, AI will align with the developers’ goals. If the developers are benevolent, they will specify a goal that is beneficial to humans. Since the developers’ goals have a higher priority than the users’ goals, if a user specifies an inappropriate goal, the AI can refuse. 2. Guiding the AI to “do the right thing” through the developers’ goals and constraining the AI to “not do the wrong thing” through the rules may seem a bit redundant. If the AI has learned to do the right thing, it should not do the wrong thing. However, the significance of the rules is that they can serve as a standard for AI monitoring, making it clear to the monitors under what circumstances the AI’s actions should be stopped. 3. If the monitor is an equally intelligent AI, it should have able to identify those behaviors that attempt to bypass the loopholes in the rules.
Thank you for your comment! I think my solution is applicable to arbitrary intelligent AI for the following reasons:
1. During the development stage, AI will align with the developers’ goals. If the developers are benevolent, they will specify a goal that is beneficial to humans. Since the developers’ goals have a higher priority than the users’ goals, if a user specifies an inappropriate goal, the AI can refuse.
2. Guiding the AI to “do the right thing” through the developers’ goals and constraining the AI to “not do the wrong thing” through the rules may seem a bit redundant. If the AI has learned to do the right thing, it should not do the wrong thing. However, the significance of the rules is that they can serve as a standard for AI monitoring, making it clear to the monitors under what circumstances the AI’s actions should be stopped.
3. If the monitor is an equally intelligent AI, it should have able to identify those behaviors that attempt to bypass the loopholes in the rules.