I have thought about the “create a safe genie, use it to prevent existential risks, and have human researchers think about the full FAI problem over a long period of time” route, and I find it appealing sometimes. But there are quite a lot of theoretical issues in creating a safe genie!
That is absolutely not a route I would consider. If that’s what you took away from my suggestion, please re-read it! My suggestion is that MIRI should consider pathways to leveradging superintelligence which don’t involve agent-y processes (genies) at all. Processes which are incapable of taking action themselves, and whose internal processes are real-time audited and programmatically constrained to make deception detectable. Tools used as cognitive enhancers, not stand-alone cognitive artifacts with their own in-built goals.
SIAI spent a decade building up awareness of the problems that arise from superintelligent machine agents. MIRI has presumed from the start that the way to counteract this threat is to build a provably-safe agent. I have argued that this is the wrong lesson to draw—the better path forward is to not create non-human agents of any type, at all!
For one, even a ‘tool’ could return a catastrophic solution that humans might unwittingly implement. Secondly, it’s conceivable that ‘tool AIs’ can ‘spontaneously agentize’, and you might as well try to build an agent on purpose for the sake of greater predictability and transparency. That is, as soon as you talk about leveraging ‘superintelligence’ rather than ‘intelligence’, you’re talking about software with qualitatively different algorithms; software that not only searches for solutions but goes about planning how to do it. (You might say, “Ah, but that’s where your mistake begins! We shouldn’t let it plan! That’s too agent-y!” Then it ceases to be superintelligence. Those are the cognitive tasks that we would be outsourcing.) It seems that at a certain point on a scale of intelligence, tool AIs move quickly from ‘not unprecedentedly useful’ to ‘just as dangerous as agents’, and thus are not worth pursuing.
There’s a more nuanced approach to what I’ve said above. I’ve really never understood all of the fuss about whether we should use tools, oracles, genies, or sovereigns. The differences seem irrelevant. ‘Don’t design it such that it has goal-directed behavior,’ or ‘design it such that it must demonstrate solutions instead of performing them,’ or ‘design it such that it can only act on our command,’ seem like they’re in a similar class of mistake as ‘design the AI so that it values our happiness’ or some such; like it’s the sort of solution that you propose when you haven’t thought about the problem in enough technical detail and you’ve only talked about it in natural language. I’ve always thought of ‘agent’ as a term of convenience. Powerful optimization processes happen to produce effects similar to the effects produced by the things to which we refer when we discuss ‘agents’ in natural language. Natural language is convenient, but imprecise; ultimately, we’re talking about optimization processes in every case. Those are all ad hoc safety procedures. Far be it from me to speak for them, but I don’t interpret MIRI as advocating agents over everything else per se, so much as advocating formally verified optimization processes over optimization processes constrained by ad hoc safety procedures, and speaking of ‘agents’ is the most accurate way to state one facet of that advocacy in natural language.
To summarize: The difference between tool AIs and agents is the difference between a human perceiving an optimization process in non-teleological and teleological terms, respectively. If the optimization process itself is provably safe, then the ad hoc safety procedures (‘no explicitly goal-directed behavior,’ ‘demonstrations only; no actions,’ etc.) will be unnecessary; if the optimization process is not safe, then the ad hoc safety procedures will be insufficient; given these points, conceiving of AGIs as tools is a distraction from other work.
EDIT: I’ve been looking around since I wrote this, and I’m highly encouraged that Vladimir_Nesov and Eliezer have made similar points about tools, and Eliezer has also made a similar point about oracles. My point generalizes their points: Optimization power is what makes AGI useful and what makes it dangerous. Optimization processes hit low probability targets in large search spaces, and the target is a ‘goal.’ Tools aren’t AIs ‘without’ goals, as if that would mean anything; they’re AIs with implicit, unspecified goals. You’re not making them Not-Goal-Directed; you’re unnecessarily leaving the goals up for grabs.
That is absolutely not a route I would consider. If that’s what you took away from my suggestion, please re-read it! My suggestion is that MIRI should consider pathways to leveradging superintelligence which don’t involve agent-y processes (genies) at all. Processes which are incapable of taking action themselves, and whose internal processes are real-time audited and programmatically constrained to make deception detectable. Tools used as cognitive enhancers, not stand-alone cognitive artifacts with their own in-built goals.
SIAI spent a decade building up awareness of the problems that arise from superintelligent machine agents. MIRI has presumed from the start that the way to counteract this threat is to build a provably-safe agent. I have argued that this is the wrong lesson to draw—the better path forward is to not create non-human agents of any type, at all!
How would you prevent others from building agent-type AIs, though?
For one, even a ‘tool’ could return a catastrophic solution that humans might unwittingly implement. Secondly, it’s conceivable that ‘tool AIs’ can ‘spontaneously agentize’, and you might as well try to build an agent on purpose for the sake of greater predictability and transparency. That is, as soon as you talk about leveraging ‘superintelligence’ rather than ‘intelligence’, you’re talking about software with qualitatively different algorithms; software that not only searches for solutions but goes about planning how to do it. (You might say, “Ah, but that’s where your mistake begins! We shouldn’t let it plan! That’s too agent-y!” Then it ceases to be superintelligence. Those are the cognitive tasks that we would be outsourcing.) It seems that at a certain point on a scale of intelligence, tool AIs move quickly from ‘not unprecedentedly useful’ to ‘just as dangerous as agents’, and thus are not worth pursuing.
There’s a more nuanced approach to what I’ve said above. I’ve really never understood all of the fuss about whether we should use tools, oracles, genies, or sovereigns. The differences seem irrelevant. ‘Don’t design it such that it has goal-directed behavior,’ or ‘design it such that it must demonstrate solutions instead of performing them,’ or ‘design it such that it can only act on our command,’ seem like they’re in a similar class of mistake as ‘design the AI so that it values our happiness’ or some such; like it’s the sort of solution that you propose when you haven’t thought about the problem in enough technical detail and you’ve only talked about it in natural language. I’ve always thought of ‘agent’ as a term of convenience. Powerful optimization processes happen to produce effects similar to the effects produced by the things to which we refer when we discuss ‘agents’ in natural language. Natural language is convenient, but imprecise; ultimately, we’re talking about optimization processes in every case. Those are all ad hoc safety procedures. Far be it from me to speak for them, but I don’t interpret MIRI as advocating agents over everything else per se, so much as advocating formally verified optimization processes over optimization processes constrained by ad hoc safety procedures, and speaking of ‘agents’ is the most accurate way to state one facet of that advocacy in natural language.
To summarize: The difference between tool AIs and agents is the difference between a human perceiving an optimization process in non-teleological and teleological terms, respectively. If the optimization process itself is provably safe, then the ad hoc safety procedures (‘no explicitly goal-directed behavior,’ ‘demonstrations only; no actions,’ etc.) will be unnecessary; if the optimization process is not safe, then the ad hoc safety procedures will be insufficient; given these points, conceiving of AGIs as tools is a distraction from other work.
EDIT: I’ve been looking around since I wrote this, and I’m highly encouraged that Vladimir_Nesov and Eliezer have made similar points about tools, and Eliezer has also made a similar point about oracles. My point generalizes their points: Optimization power is what makes AGI useful and what makes it dangerous. Optimization processes hit low probability targets in large search spaces, and the target is a ‘goal.’ Tools aren’t AIs ‘without’ goals, as if that would mean anything; they’re AIs with implicit, unspecified goals. You’re not making them Not-Goal-Directed; you’re unnecessarily leaving the goals up for grabs.