Computation Hazards

This is a summary of material from various posts and discussions. My thanks to Eliezer Yudkowsky, Daniel Dewey, Paul Christiano, Nick Beckstead, and several others.

Several ideas have been floating around LessWrong that can be organized under one concept, relating to a subset of AI safety problems. I’d like to gather these ideas in one place so they can be discussed as a unified concept. To give a definition:

A computation hazard is a large negative consequence that may arise merely from vast amounts of computation, such as in a future supercomputer.

For example, suppose a computer program needs to model people very accurately to make some predictions, and it models those people so accurately that the “simulated” people can experience conscious suffering. In a very large computation of this type, millions of people could be created, suffer for some time, and then be destroyed when they are no longer needed for making the predictions desired by the program. This idea was first mentioned by Eliezer Yudkowsky in Nonperson Predicates.

There are other hazards that may arise in the course of running large-scale computations. In general, we might say that:

Large amounts of computation will likely consist in running many diverse algorithms. Many algorithms are computation hazards. Therefore, all else equal, the larger the computation, the more likely it is to produce a computation hazard.

Of course, most algorithms may be morally neutral. Furthermore, algorithms must be somewhat complex before they could possibly be a hazard. For instance, it is intuitively clear that no eight-bit program could possibly be a computation hazard on a normal computer. Worrying computations therefore fall into two categories: computations that run most algorithms, and computations that are particularly likely to run algorithms that are computation hazards.

An example of a computation that runs most algorithms is a mathematical formalism called Solomonoff induction. First published in 1964, it is an attempt to formalize the scientific process of induction using the theory of Turing machines. It is a brute-force method that finds hypotheses to explain data by testing all possible hypotheses. Many of these hypotheses may be algorithms that describe the functioning of people. At a sufficient precision, these algorithms themselves may experience consciousness and suffering. Taken literally, Solomonoff induction runs all algorithms; therefore it produces all possible computation hazards. If we are to avoid computation hazards, any implemented approximations of Solomonoff induction will need to determine ahead of time which algorithms are computation hazards.

Computations that run most algorithms could also hide in other places. Imagine a supercomputer’s power is being tested on a simple game, like chess or Go. The testing program simply tries all possible strategies, according to some enumeration. The best strategy that the supercomputer finds would be a measure of how many computations it could perform, compared to other computers that ran the same program. If the rules of the game are complex enough to be Turing complete (a surprisingly easy achievement) then this game-playing program would eventually simulate all algorithms, including ones with moral status.

Of course, running most algorithms is quite infeasible simply because of the vast number of possible algorithms. Depending on the fraction of algorithms that are computation hazards, it may be enough that a computation run an enormous number which act as a random sample of all algorithms. Computations of this type might include evolutionary programs, which are blind to the types of algorithms they run until the results are evaluated for fitness. Or they may be Monte Carlo approximations of massive computations.

But if computation hazards are relatively rare, then it will still be unlikely for large-scale computations to stumble across them unguided. Several computations may fall into the second category of computations that are particularly likely to run algorithms that are computation hazards. Here we focus on three types of computations in particular: agents, predictors and oracles. The last two types are especially important because they are often considered safer types of AI than agent-based AI architectures. First I will stipulate definitions for these three types of computations, and then I will discuss the types of computation hazards they may produce.

Agents

An agent is a computation which decides between possible actions based on the consequences of those actions. They can be thought of as “steering” the future towards some target, or as selecting a future from the set of possible futures. Therefore they can also be thought of as having a goal, or as maximizing a utility function.

Sufficiently powerful agents are extremely powerful because they constitute a feedback loop. Well-known from physics, feedback loops often change their surroundings incredibly quickly and dramatically. Examples include the growth of biological populations, and nuclear reactions. Feedback loops are dangerous if their target is undesirable. Agents will be feedback loops as soon as they are able to improve their ability to improve their ability to move towards their goal. For example, humans can improve their ability to move towards their goal by using their intelligence to make decisions. A student aiming to create cures can use her intelligence to learn chemistry, therefore improving her ability to decide what to study next. But presently, humans cannot improve their intelligence, which would improve their ability to improve their ability to make decisions. The student cannot yet learn how to modify her brain in order for her to more quickly learn subjects.

Predictors

A predictor is a computation which takes data as input, and predicts what data will come next. An example would be certain types of trained neural networks, or any approximation of Solomonoff induction. Intuitively, this feels safer than an agent AI because predictors do not seem to have goals or take actions; they just report predictions as requested by human.

Oracles

An oracle is a computation which takes questions as input, and returns answers. They are broader than predictors in that one could ask an oracle about predictions. Similar to a predictor, oracles do not seem to have goals or take actions. (Some material summarized here.)

Examples of hazards

Agent-like computations are the most clearly dangerous computation hazards. If any large computation starts running the beginning of a self-improving agent computation, it is difficult to say how far the agent may safely be run before it is a computation hazard. As soon as the agent is sufficiently intelligent, it will attempt to acquire more resources like computing substrate and energy. It may also attempt to free itself from control of the parent computation.

Another major concern is that, because people are an important part of the surroundings, even non-agent predictors or oracles will simulate people in order to make predictions or give answers respectively. Someone could ask a predictor, “What will this engineer do if we give him a contract?” It may be that the easiest way for the predictor to determine the answer is to simulate the internal workings of the given engineer’s mind. If these simulations are sufficiently precise, then they will be people in and of themselves. The simulations could cause those people to suffer, and will likely kill them by ending the simulation when the prediction or answer is given.

Similarly, one can imagine that a predictor or oracle might simulate powerful agents; that is, algorithms which efficiently maximize some utility function. Agents may be simulated because many agent-like entities exist in the real world, and their behavior would need to be modeled. Or, perhaps oracles would investigate agents for the purpose of answering questions better. These agents, while being simulated, may have goals that require acting independently of the oracle. These agents may also be more powerful than the oracles, especially since the oracles were not designed with self-improvement behavior in mind. Therefore these agents may attempt to “unbox” themselves from the simulation and begin controlling the rest of the universe. For instance, the agents may use previous questions given to the oracle to deduce the nature of the universe and the psychology of the oracle-creators. (For a fictional example, see That Alien Message.) Or, the agent might somehow distort the output of the predictor, in a way that what the oracle predicts will cause us to unbox the agent.

Predictors also have the problem of self-fulfilling prophecies (first suggested here). An arbitrarily accurate predictor will know that its prediction will affect the future. Therefore, to be a correct prediction, it must make sure that delivering its prediction doesn’t cause the receiver to act in a way that negates the prediction. Therefore, the predictor may have to choose between predictions which cause the receiver to act in a way that fulfills the prediction. This is a type of control over the user. Since the predictor is super-intelligent, any control may rapidly optimize the universe towards some unknown goal.

Overall, there is a large worry that sufficiently intelligent oracles or predictors may become agents. Beside the above possibilities, some are worried that intelligence is inherently an optimization process, and therefore oracles and predictors are inherently satisfying some utility function. This, combined with the fact that nothing can be causally isolated from the rest of the universe, seems to invite an eventual AI-takeoff.

Methods for avoiding computational hazards

It is often thought that, while no proposal has yet been shown safe from computational hazards, oracles and predictors are safer than deliberately agent-based AGI. Other methods have been proposed to make these even safer. Armstrong et al. describe many AI safety measures in general. Below we review some possible techniques for avoiding computational hazards specifically.

One obvious safety practice is to limit the complexity, or the size of computations. In general, this will also limit the algorithm below general intelligence, but it is a good step while progressing towards FAI. Indeed, it is clear that all current prediction or AI systems are too simple to either be general intelligences, or pose as a computational hazard.

A proposal for regulating complex oracles or predictors is to develop safety indicators. That is, develop some function that will evaluate the proposed algorithm or model, and return whether it is potentially dangerous. For instance, one could write a simple program that rejects running an algorithm if any part of it is isomorphic to the human genome (since DNA clearly creates general intelligence and people under the right circumstances). Or, to measure the impact of an action suggested by an oracle, one could ask how many humans would be alive one year after the action was taken.

But one could only run an algorithm if they were sure it was not a person. A function that could evaluate an algorithm and return 0 only if it is not a person is called a nonperson predicate. Some algorithms are obviously not people. For example, squaring the numbers from 1 to 100 will not simulate people. Any algorithm whose behavior is periodic with a short period is unlikely to be a person, or nearly any presently constructed software. But in general this seems extremely difficult to verify. It could be that writing nonperson predicates or other safety indicators is FAI-complete in that sense that if we solve them, we will have discovered friendliness theory. Furthermore, it may be that some attempts to evaluate whether an algorithm is a person actually causes a simulation of a person, by running parts of the algorithm, by modeling a person for comparison, or by other means. Similarly, it may be that attempts to investigate the friendliness of a particular agent cause that agent to unbox itself.

Predictors seem to be one of the most goal-agnostic forms of AGI. This makes them a very attractive model in which to perfect safety. Some ideas for avoiding self-fulfilling predictions suggest that we ask the predictor to tell us what it would have predicted if we hadn’t asked (first suggested here). This frees the predictor from requiring itself to make predictions consistent with our behavior. Whether this will work depends on the exact process of the predictor; it may be so accurate that it cannot deal with counterfactuals, and will simply report that it would have predicted that we would have asked anyway. It is also problematic that the prediction is now inaccurate; because it has told us, we will act, possibly voiding any part of the prediction.

A very plausible but non-formal solution is to aim for a soft takeoff. For example, we could build a predictor that is not generally intelligent, and use it to investigate safe ways advance the situation. Perhaps we could use a sub-general intelligence to safely improve our own intelligence.

Have I missed any major examples in this post? Does “computation hazards” seem like a valid concept as distinct from other types of AI-risks?

References

Armstrong S., Sandberg A., Bostrom N. (2012). “Thinking inside the box: using and controlling an Oracle AI”. Minds and Machines, forthcoming.

Solomonoff, R., “A Formal Theory of Inductive Inference, Part I” Information and Control, Vol 7, No. 1 pp 1-22, March 1964.

Solomonoff, R., “A Formal Theory of Inductive Inference, Part II” Information and Control, Vol 7, No. 2 pp 224-254, June 1964.