Wow there is a world of assumptions wrapped up in there. For example that the AI has a concept of external agents and an ability to model their internal belief state. That an external agent can have a belief about the world which is wrong. This may sound intuitively obvious, but it’s not a simple thing. This kind of social awareness takes time to be learnt by humans as well. Heinz Wimmer and Josef Perner showed that below a certain age (3-4 years) kids lack an ability to track this information. A teacher puts a toy in a blue cupboard, then leaves the room and you move it to the red cupboard, and the teacher comes back into the room. If you ask the kid not where the toy is, but what cupboard the teacher will look in to find it, and they will say the red cupboard.
It’s no accident that it takes time for this skill to develop. It’s actually quite complex to be able to keep track of and simulate the states of mind of other agents acting in our world. We just take it for granted because we are all well-adjusted adults of a species evolved for social intelligence. But an AI need not think in that way, and indeed of the most interesting use cases for tool AI (“design me a nanofactory constructible with existing tools” or “design a set of experiments organized as a decision tree for accomplishing the SENS research objectives”) would be best accomplished by an idiot savant with no need for social awareness.
I think it goes without saying that obvious AI safety rule #1 is don’t connect an UFAI to the internet. Another obvious rule I think is don’t build in capabilities not required to achieve the things it is tasked with. For the applications of AI I imagine in the pre-singularity timeframe, social intelligence is not a requirement. So when you say “part of its model of the world involves a model of its controllers”, I think that is assuming a capability the AI should not have built-in.
(This is all predicated on soft-enough takeoff that there would be sufficient warning if/when the AI self-developed a social awareness capability.)
Also, what 27chaos said is also worth articulating in my own words. If you want to prevent an intelligent agent from taking a particular category of actions there are two ways of achieving that requirement: (a) have a filter or goal system which prevents the AI from taking (box) or selecting (goal) actions of that type; or (b) prevent it by design from thinking such thoughts to begin with. An AI won’t take actions it never even considered in the first place. While the latter course of action isn’t really possible with unbounded universal inference engines (since “enumerate all possibilities” is usually a step in their construction), such designs arise quite naturally out of more realistic psychology-inspired designs.
The approach to AGI safety that you’re outlining (keep it as a tool AI, don’t give it sophisticated social modeling capability, never give it access to the Internet) is one that I agree should work to keep the AGI safely contained in most cases. But my worry is that this particular approach being safe isn’t actually very useful, because there are going to be immense incentives to give the AGI more general capabilities and have it act more autonomously.
As with a boxed AGI, there are many factors that would tempt the owners of an Oracle AI to transform it to an autonomously acting agent. Such an AGI would be far more effective in furthering its goals, but also far more dangerous.
Current narrow-AI technology includes HFT algorithms, which make trading decisions within fractions of a second, far too fast to keep humans in the loop. HFT seeks to make a very short-term profit, but even traders looking for a longer-term investment benefit from being faster than their competitors. Market prices are also very effective at incorporating various sources of knowledge [135]. As a consequence, a trading algorithmʼs performance might be improved both by making it faster and by making it more capable of integrating various sources of knowledge. Most advances toward general AGI will likely be quickly taken advantage of in the financial markets, with little opportunity for a human to vet all the decisions. Oracle AIs are unlikely to remain as pure oracles for long.
Similarly, Wallach [283] discuss the topic of autonomous robotic weaponry and note that the US military is seeking to eventually transition to a state where the human operators of robot weapons are ‘on the loop’ rather than ‘in the loop’. In other words, whereas a human was previously required to explicitly give the order before a robot was allowed to initiate possibly lethal activity, in the future humans are meant to merely supervise the robotʼs actions and interfere if something goes wrong.
Human Rights Watch [90] reports on a number of military systems which are becoming increasingly autonomous, with the human oversight for automatic weapons defense systems—designed to detect and shoot down incoming missiles and rockets—already being limited to accepting or overriding the computerʼs plan of action in a matter of seconds. Although these systems are better described as automatic, carrying out pre-programmed sequences of actions in a structured environment, than autonomous, they are a good demonstration of a situation where rapid decisions are needed and the extent of human oversight is limited. A number of militaries are considering the future use of more autonomous weapons.
In general, any broad domain involving high stakes, adversarial decision making and a need to act rapidly is likely to become increasingly dominated by autonomous systems. The extent to which the systems will need general intelligence will depend on the domain, but domains such as corporate management, fraud detection and warfare could plausibly make use of all the intelligence they can get. If oneʼs opponents in the domain are also using increasingly autonomous AI/AGI, there will be an arms race where one might have little choice but to give increasing amounts of control to AI/AGI systems.
Miller [189] also points out that if a person was close to death, due to natural causes, being on the losing side of a war, or any other reason, they might turn even a potentially dangerous AGI system free. This would be a rational course of action as long as they primarily valued their own survival and thought that even a small chance of the AGI saving their life was better than a near-certain death.
Some AGI designers might also choose to create less constrained and more free-acting AGIs for aesthetic or moral reasons, preferring advanced minds to have more freedom.
So while I agree that a strict boxing approach would be sufficient to contain the AGI if everyone were to use it, it only works if everyone were indeed to use it, so what we need is an approach that works for more autonomous systems as well.
If you want to prevent an intelligent agent from taking a particular category of actions there are two ways of achieving that requirement: (a) have a filter or goal system which prevents the AI from taking (box) or selecting (goal) actions of that type; or (b) prevent it by design from thinking such thoughts to begin with. An AI won’t take actions it never even considered in the first place. While the latter course of action isn’t really possible with unbounded universal inference engines (since “enumerate all possibilities” is usually a step in their construction), such designs arise quite naturally out of more realistic psychology-inspired designs.
While I actually agree that tool AI goals can be programmed, if you want to keep the whole thing from turning unsafely agenty, you’re going to have to strictly separate the inductive reasoning from the actual tool run: run induction for a while, then use tool-mode to compose plans over the induced models of the world, potentially after censoring those models for safety.
Wow there is a world of assumptions wrapped up in there. For example that the AI has a concept of external agents and an ability to model their internal belief state. That an external agent can have a belief about the world which is wrong. This may sound intuitively obvious, but it’s not a simple thing. This kind of social awareness takes time to be learnt by humans as well. Heinz Wimmer and Josef Perner showed that below a certain age (3-4 years) kids lack an ability to track this information. A teacher puts a toy in a blue cupboard, then leaves the room and you move it to the red cupboard, and the teacher comes back into the room. If you ask the kid not where the toy is, but what cupboard the teacher will look in to find it, and they will say the red cupboard.
It’s no accident that it takes time for this skill to develop. It’s actually quite complex to be able to keep track of and simulate the states of mind of other agents acting in our world. We just take it for granted because we are all well-adjusted adults of a species evolved for social intelligence. But an AI need not think in that way, and indeed of the most interesting use cases for tool AI (“design me a nanofactory constructible with existing tools” or “design a set of experiments organized as a decision tree for accomplishing the SENS research objectives”) would be best accomplished by an idiot savant with no need for social awareness.
I think it goes without saying that obvious AI safety rule #1 is don’t connect an UFAI to the internet. Another obvious rule I think is don’t build in capabilities not required to achieve the things it is tasked with. For the applications of AI I imagine in the pre-singularity timeframe, social intelligence is not a requirement. So when you say “part of its model of the world involves a model of its controllers”, I think that is assuming a capability the AI should not have built-in.
(This is all predicated on soft-enough takeoff that there would be sufficient warning if/when the AI self-developed a social awareness capability.)
Also, what 27chaos said is also worth articulating in my own words. If you want to prevent an intelligent agent from taking a particular category of actions there are two ways of achieving that requirement: (a) have a filter or goal system which prevents the AI from taking (box) or selecting (goal) actions of that type; or (b) prevent it by design from thinking such thoughts to begin with. An AI won’t take actions it never even considered in the first place. While the latter course of action isn’t really possible with unbounded universal inference engines (since “enumerate all possibilities” is usually a step in their construction), such designs arise quite naturally out of more realistic psychology-inspired designs.
The approach to AGI safety that you’re outlining (keep it as a tool AI, don’t give it sophisticated social modeling capability, never give it access to the Internet) is one that I agree should work to keep the AGI safely contained in most cases. But my worry is that this particular approach being safe isn’t actually very useful, because there are going to be immense incentives to give the AGI more general capabilities and have it act more autonomously.
As we wrote in Responses to Catastrophic AGI Risk:
So while I agree that a strict boxing approach would be sufficient to contain the AGI if everyone were to use it, it only works if everyone were indeed to use it, so what we need is an approach that works for more autonomous systems as well.
Hmm. That sounds like a very interesting idea.
While I actually agree that tool AI goals can be programmed, if you want to keep the whole thing from turning unsafely agenty, you’re going to have to strictly separate the inductive reasoning from the actual tool run: run induction for a while, then use tool-mode to compose plans over the induced models of the world, potentially after censoring those models for safety.