First, there’s the political problem: if you can build agent AI and just choose not to, this doesn’t help very much when someone else builds their UFAI (which they want to do, because agent AI is very powerful and therefore very useful). So you have to get everyone on board with the plan first. Also, having your superintelligent oracle makes it much easier for someone else to build an agent: just ask the oracle how. If you don’t solve Friendliness, you have to solve the incentives instead, and “solve politics” doesn’t look much easier than “solve metaethics.”
Second, the distinction between agents and oracles gets fuzzy when the AI is much smarter than you. Suppose you ask the AI how to reduce gun violence: it spits out a bunch of complex policy changes, which are hard for you to predict the effects of. But you implement them, and it turns out that they result in drastically reduced willingness to have children. The population plummets, and gun violence deaths do too. “Okay, how do I reduce per capita gun violence?”, you ask. More complex policy changes; this time they result in increased pollution which disproportionately depopulates the demographics most likely to commit gun violence. “How do I reduce per capita gun violence without altering the size or demographic ratios of the population?” Its recommendations cause a worldwide collapse of the firearms manufacturing industry, and gun violence plummets, along with most metrics of human welfare.
If you have to blindly implement policies you can’t understand, you’re not really much better off than letting the AI implement them directly. There are some things you can do to mitigate this, but ultimately the AI is smarter than you. If you could fully understand all its ideas, you wouldn’t have needed to ask it.
Does this sound familiar? It’s the untrustworthy genie problem again. We need a trustworthy genie, one that will answer the questions we mean to ask, not just the questions we actually ask. So we need an oracle that understands and implements human values, which puts us right back at the original problem of Friendliness!
Non-agent AI might be a useful component of realistic safe AI development, just as “boxing” might be. Seatbelts are a good idea too, but it only matters if something has already gone wrong. Similarly, oracle AI might help, but it’s not a replacement for solving the actual problem.
First, there’s the political problem: if you can build agent AI and just choose not to, this doesn’t help very much when someone else builds their UFAI (which they want to do, because agent AI is very powerful and therefore very useful). So you have to get everyone on board with the plan first. Also, having your superintelligent oracle makes it much easier for someone else to build an agent: just ask the oracle how. If you don’t solve Friendliness, you have to solve the incentives instead, and “solve politics” doesn’t look much easier than “solve metaethics.”
Second, the distinction between agents and oracles gets fuzzy when the AI is much smarter than you. Suppose you ask the AI how to reduce gun violence: it spits out a bunch of complex policy changes, which are hard for you to predict the effects of. But you implement them, and it turns out that they result in drastically reduced willingness to have children. The population plummets, and gun violence deaths do too. “Okay, how do I reduce per capita gun violence?”, you ask. More complex policy changes; this time they result in increased pollution which disproportionately depopulates the demographics most likely to commit gun violence. “How do I reduce per capita gun violence without altering the size or demographic ratios of the population?” Its recommendations cause a worldwide collapse of the firearms manufacturing industry, and gun violence plummets, along with most metrics of human welfare.
If you have to blindly implement policies you can’t understand, you’re not really much better off than letting the AI implement them directly. There are some things you can do to mitigate this, but ultimately the AI is smarter than you. If you could fully understand all its ideas, you wouldn’t have needed to ask it.
Does this sound familiar? It’s the untrustworthy genie problem again. We need a trustworthy genie, one that will answer the questions we mean to ask, not just the questions we actually ask. So we need an oracle that understands and implements human values, which puts us right back at the original problem of Friendliness!
Non-agent AI might be a useful component of realistic safe AI development, just as “boxing” might be. Seatbelts are a good idea too, but it only matters if something has already gone wrong. Similarly, oracle AI might help, but it’s not a replacement for solving the actual problem.