Besides Eliezer’s rather strong-looking argument, ethically creating Obedient AI would require solving the following scary problems:
A “nonperson predicate” that can ensure the AI doesn’t create simulations which themselves count as people. If we fail to solve this one, then I could be a simulation the AI made in order to test how people like me react to torture.
A way to ensure the AI itself does not count as a person, so that we don’t feel sad if it eventually switches itself off. See here for a fuller explanation of why this matters.
Now, I think Wei Dai suggested we start by building a “philosophical” AI that could solve such problems for us. I don’t think philosophy is a natural class. (A ‘correct way to do philosophy’ sounds like a fully general correct way to think and act.) But if we get the AI’s goals right, then maybe it could start out restricted by flawed and overcautious answers to these questions, but find us some better answers. Maybe.
I am aware of the need for those things (part of what I mean by (need for friendliness in OAI) but as far as I can tell, Paternalistic FAI requires you to solve those problems, plus simple ‘not being very powerful but insane’, plus basic understandings of what matters to humans, plus incredibly meta human values matters. An OAI can leave off the last one of those problems.
I meant that by going meta we might not have to solve them fully.
All the problems you list sound nearly identical to me. In particular, “what matters to humans” sounds more vague but just as meta. If it includes enough details to actually reassure me, you could just tell the AI, “Do that.” Presumably what matters to us would include ‘the ability to affect our environment, eg by giving orders.’ What do you mean by “very powerful but insane”? I want to parse that as ‘intelligent in the sense of having accurate models that allow it to shape the future, but not programmed to do what matters to humans.’
“very powerful but insane” : AI’s response to orders seem to make less than no sense, yet AI is still able to do damage.
“What matters to humans”: Things like the Outcome Pump example, where any child would know that not dying is supposed to be part of “out of the building”, but not including the problems that we are bad at solving, such as fun theory and the like.
Besides Eliezer’s rather strong-looking argument, ethically creating Obedient AI would require solving the following scary problems:
A “nonperson predicate” that can ensure the AI doesn’t create simulations which themselves count as people. If we fail to solve this one, then I could be a simulation the AI made in order to test how people like me react to torture.
A way to ensure the AI itself does not count as a person, so that we don’t feel sad if it eventually switches itself off. See here for a fuller explanation of why this matters.
Now, I think Wei Dai suggested we start by building a “philosophical” AI that could solve such problems for us. I don’t think philosophy is a natural class. (A ‘correct way to do philosophy’ sounds like a fully general correct way to think and act.) But if we get the AI’s goals right, then maybe it could start out restricted by flawed and overcautious answers to these questions, but find us some better answers. Maybe.
I am aware of the need for those things (part of what I mean by (need for friendliness in OAI) but as far as I can tell, Paternalistic FAI requires you to solve those problems, plus simple ‘not being very powerful but insane’, plus basic understandings of what matters to humans, plus incredibly meta human values matters. An OAI can leave off the last one of those problems.
I meant that by going meta we might not have to solve them fully.
All the problems you list sound nearly identical to me. In particular, “what matters to humans” sounds more vague but just as meta. If it includes enough details to actually reassure me, you could just tell the AI, “Do that.” Presumably what matters to us would include ‘the ability to affect our environment, eg by giving orders.’ What do you mean by “very powerful but insane”? I want to parse that as ‘intelligent in the sense of having accurate models that allow it to shape the future, but not programmed to do what matters to humans.’
“very powerful but insane” : AI’s response to orders seem to make less than no sense, yet AI is still able to do damage. “What matters to humans”: Things like the Outcome Pump example, where any child would know that not dying is supposed to be part of “out of the building”, but not including the problems that we are bad at solving, such as fun theory and the like.