“don’t do anything the user would find terrible; acquire resources; make sure the user remains safe and retains effective control over those resource”
Acquiring resources has a lot of ethical implications. If you’re inventing new technologies and selling them, you could be increasing existential risk. If you’re trading with others, you would be enriching one group at the expense of another. If you’re extracting natural resources, there’s questions of fairness (how hard should you drive bargains or attempt to burn commons) and time preference (do you want to maximize short term or long term resource extraction). And how much do you care about animal suffering, or the world remaining “natural”? I guess the AI could present a plan that involves asking the overseer to answer these questions, but the overseer probably doesn’t have the answers either (or at least should not be confident of his or her answers).
What we want is to develop an AI that can eventually do philosophy and answer these questions on its own, and correctly. It’s the “doing philosophy correctly on its own” part that I do not see how to test for in a black-box design, without giving the AI so much power that it can escape human control if something goes wrong. The AI’s behavior, while it’s in the not-yet-superintelligent, “ask the overseer about every ethical question” phase, doesn’t seem to tell us much about how good the design and implementation is, metaphilosophically.
Acquiring resources has a lot of ethical implications. If you’re inventing new technologies and selling them, you could be increasing existential risk. If you’re trading with others, you would be enriching one group at the expense of another. If you’re extracting natural resources, there’s questions of fairness (how hard should you drive bargains or attempt to burn commons) and time preference (do you want to maximize short term or long term resource extraction). And how much do you care about animal suffering, or the world remaining “natural”? I guess the AI could present a plan that involves asking the overseer to answer these questions, but the overseer probably doesn’t have the answers either (or at least should not be confident of his or her answers).
What we want is to develop an AI that can eventually do philosophy and answer these questions on its own, and correctly. It’s the “doing philosophy correctly on its own” part that I do not see how to test for in a black-box design, without giving the AI so much power that it can escape human control if something goes wrong. The AI’s behavior, while it’s in the not-yet-superintelligent, “ask the overseer about every ethical question” phase, doesn’t seem to tell us much about how good the design and implementation is, metaphilosophically.