In defense of Oracle (“Tool”) AI research

(Update 2022: Enjoy the post, but note that it’s old, has some errors, and is certainly not reflective of my current thinking. –Steve)

Low confidence; offering this up for discussion

An Oracle AI is an AI that only answers questions, and doesn’t take any other actions. The opposite of an Oracle AI is an Agent AI, which might also send emails, control actuators, etc.

I’m especially excited about the possibility of non-self-improving oracle AIs, dubbed Tool AI in a 2012 article by Holden Karnofsky.

I’ve seen two arguments against this “Tool AI”:

  • First, as in Eliezer’s 2012 response to Holden, we don’t know how to safely make and operate an oracle AGI (just like every other type of AGI). Fair enough! I never said this is an easy solution to all our problems! (But see my separate post for why I’m thinking about this.)

  • Second, as in Gwern’s 2016 essay, there’s a coordination problem. Even if we could build a safe oracle AGI, the argument goes, there will still be an economic incentive to build an agent AGI, because you can do more and better and faster by empowering the AGI to take actions. Thus, agreeing to never ever build agent AGIs is a very hard coordination problem for society. I don’t find the coordination argument compelling—in fact, I think it’s backwards—and I wrote this post to explain why.

Five reasons I don’t believe the coordination /​ competitiveness argument against oracles

1. If the oracle isn’t smart or powerful enough for our needs, we can solve that by bootstrapping. Even if the oracle is not inherently self-modifying, we can ask it for advice and do human-in-the-loop modifications to make more powerful successor oracles. By the same token, we can ask an oracle AGI for advice about how to design a safe agent AGI.

2. Avoiding coordination problems is a pipe dream; we need to solve the coordination problem at some point, and that point might as well be at the oracle stage. As far as I can tell, we will never get to a stage where we know how to build safe AGIs and where there is no possibility of making more-powerful-and-less-safe AGIs. If we have a goal in the world that we really really want to happen, a low-impact agent is going to be less effective than a not-impact-restrained agent; an act-based agent is going to be less effective than a goal-seeking agent;[1] and so on and so forth. It seems likely that, no matter how powerful a safe AGI we can make, there will always be an incentive for people to try experimenting with even more powerful unsafe alternative designs.

Therefore, at some point in AI development, we have to blow the whistle, declare that technical solutions aren’t enough, and we need to start relying 100% on actually solving the coordination problem. When is that point? Hopefully far enough along that we realize the benefits of AGI for humanity—automating the development of new technology to help solve problems, dramatically improving our ability to think clearly and foresightedly about our decisions, and so on. Oracles can do all that! So why not just stop when we get to AGI oracles?

Indeed, once I started thinking along those lines, I actually see the coordination argument going in the other direction! I say restricting ourselves to oracle AI make coordination easier, not harder! Why is that? Two more reasons:

3. We want a high technological barrier between us and the most dangerous systems: These days, I don’t think anyone takes seriously the idea of building an all-powerful benevolent dictator AGI implementing CEV. [ETA: If you do take that idea seriously, see point 1 above on bootstrapping.] At least as far as I can tell from the public discourse, there seems to be a growing consensus that humans should always and forever be in the loop of AGIs. (That certainly sounds like a good idea to me!) Thus, the biggest coordination problem we face is: “Don’t ever make a human-out-of-the-loop free-roaming AGI world-optimizer.” This is made easier by having a high technological barrier between the safe AGIs that we are building and using, and the free-roaming AGI world-optimizers that we are forbidding. If we make an agent AGI—whether corrigible, aligned, norm-following, low-impact, or whatever—I just don’t see any technological barrier there. It seems like it would be trivial for a rogue employee to tweak such an AGI to stop asking permission, deactivate the self-restraint code, and go tile the universe with hedonium at all costs (or whatever that rogue employee happens to value). By contrast, if we stop when we get to oracle AI, it seems like there would be a higher technological barrier to turning it into a free-roaming AGI world-optimizer—probably not that high a barrier, but higher than the alternatives. (The height of this technological barrier, and indeed whether there’s a barrier at all, is hard to say.… It probably depends on how exactly the oracles are constructed and access-controlled.)

4. We want a bright-line, verifiable rule between us and the most dangerous systems: Even more importantly, take the rule:

“AGIs are not allowed to do anything except output pixels onto a screen.”

This is a nice, simple, bright-line rule, which moreover has at least a chance of being verifiable by external auditors. By contrast, if we try to draw a line through the universe of agent AGIs, defining how low-impact is low-impact enough, how act-based is act-based enough, and so on, it seems to me like it would inevitably be a complicated, blurry, and unenforceable line. This would make a very hard coordination problem very much harder still.

[Clarifications on this rule: (A) I’m not saying this rule would be easy to enforce (globally and forever), only that it would be less hard than alternatives; (B) I’m not saying that, if we enforce this rule, we are free and clear of all possible existential risks, but rather that this would be a very helpful ingredient along with other control and governance measures; (C) Again, I’m presupposing here that we succeed in making superintelligent AI oracles that always give honest and non-manipulative answers; (D) I’m not saying we should outlaw all AI agents, just that we should outlaw world-modeling AGI agents. Narrow-AI robots and automated systems are fine. (I’m not sure exactly how that line would be drawn.)]

Finally, one more thing:

5. Maybe superintelligent oracle AGI is “a solution built to last (at most) until all contemporary thinking about AI has been thoroughly obsoleted...I don’t think there is a strong case for thinking much further ahead than that.” (copying from this Paul Christiano post). I hate this argument. It’s a cop-out. It’s an excuse to recklessly plow forward with no plan and everything at stake. But I have to admit, it seems to have a kernel of truth...

  1. ↩︎

    See Paul’s research agenda FAQ section 0.1 for things that act-based agents are unlikely to be able to do.