What you are envisioning is not an AGI at all, but a narrow AI. If you tell an AGI to make paperclips, but it doesn’t know what a paperclip is, then it will go and find out, using whatever means it has available. It won’t give up just because you weren’t detailed enough in telling it what you wanted.
Then I don’t think that there is anyone working on what you are envisioning as ‘AGI’ right now. If a superhuman level of sophistication regarding the potential for self-improvement is already part of your definition then there is no argument to be won or lost here regarding risk assessment of research on AGI. I do not believe this is reasonable or that AGI researchers share your definition. I believe that there is a wide range of artificial general intelligence that does not suit your definition yet deserves this terminology.
Who said anything about a superhuman level of sophistication? Human-level is enough. I’m reasonably certain that if I had the same advantages an AGI would have—that is, if I were converted into an emulation and given my own source code—then I could foom. And I think any reasonably skilled computer programmer could, too.
Yes, but after the AGI finds out what a paperclip is, it will then, if it is an AGI, start questioning why it was designed with the goal of building paperclips in the first place. And that’s where the friendly AI fallacy falls apart.
Anissimov posted a good article on exactly this point today. AGI will only question its goals according to its cognitive architecture, and come to a conclusion about its goals depending on its architecture. It could “question” its paperclip-maximization goal and come to a “conclusion” that what it really should do is tile the universe with foobarian holala.
So what? An agent with a terminal value (building paperclips) is not going to give it up, not for anything. That’s what “terminal value” means. So the AI can reason about human goals and the history of AGI research. That doesn’t mean it has to care. It cares about paperclips.
That doesn’t mean it has to care. It cares about paperclips.
It has to care because if there is the slightest motivation to be found in its goal system to hold (parameters for spatiotemporal scope boundaries), then it won’t care to continue anyway. I don’t see where the incentive to override certain parameters of its goals should come from. As Anissimov said, “If an AI questions its values, the questioning will have to come from somewhere.”
It won’t care unless it’s been programmed to care (for example by adding “spatiotemporal scope boundaries” to its goal system). It’s not going to override a terminal goal, unless it conflicts with a different terminal goal. In the context of an AI that’s been instructed to “build paperclips”, it has no incentive to care about humans, no matter how much “introspection” it does.
If you do program it to care about humans then obviously it will care. It’s my understanding that that is the hard part.
What you are envisioning is not an AGI at all, but a narrow AI. If you tell an AGI to make paperclips, but it doesn’t know what a paperclip is, then it will go and find out, using whatever means it has available. It won’t give up just because you weren’t detailed enough in telling it what you wanted.
Then I don’t think that there is anyone working on what you are envisioning as ‘AGI’ right now. If a superhuman level of sophistication regarding the potential for self-improvement is already part of your definition then there is no argument to be won or lost here regarding risk assessment of research on AGI. I do not believe this is reasonable or that AGI researchers share your definition. I believe that there is a wide range of artificial general intelligence that does not suit your definition yet deserves this terminology.
Who said anything about a superhuman level of sophistication? Human-level is enough. I’m reasonably certain that if I had the same advantages an AGI would have—that is, if I were converted into an emulation and given my own source code—then I could foom. And I think any reasonably skilled computer programmer could, too.
Debugging will be PITA. Both ways.
Yes, but after the AGI finds out what a paperclip is, it will then, if it is an AGI, start questioning why it was designed with the goal of building paperclips in the first place. And that’s where the friendly AI fallacy falls apart.
Anissimov posted a good article on exactly this point today. AGI will only question its goals according to its cognitive architecture, and come to a conclusion about its goals depending on its architecture. It could “question” its paperclip-maximization goal and come to a “conclusion” that what it really should do is tile the universe with foobarian holala.
So what? An agent with a terminal value (building paperclips) is not going to give it up, not for anything. That’s what “terminal value” means. So the AI can reason about human goals and the history of AGI research. That doesn’t mean it has to care. It cares about paperclips.
It has to care because if there is the slightest motivation to be found in its goal system to hold (parameters for spatiotemporal scope boundaries), then it won’t care to continue anyway. I don’t see where the incentive to override certain parameters of its goals should come from. As Anissimov said, “If an AI questions its values, the questioning will have to come from somewhere.”
Exactly? I think we agree about this.
It won’t care unless it’s been programmed to care (for example by adding “spatiotemporal scope boundaries” to its goal system). It’s not going to override a terminal goal, unless it conflicts with a different terminal goal. In the context of an AI that’s been instructed to “build paperclips”, it has no incentive to care about humans, no matter how much “introspection” it does.
If you do program it to care about humans then obviously it will care. It’s my understanding that that is the hard part.