Limits on self-optimisation

Disclaimer: I am a physicist, and in the field of computer science my scholarship is weak. It may be that what I suggest here is well known, or perhaps just wrong.

Abstract: A Turing machine capable of saying whether two arbitrary Turing machines have the same output for all inputs is equivalent to solving the Halting Problem. To optimise a function it is necessary to prove that the optimised version always has the same output as the unoptimised version, which is impossible in general for Turing machines. However, real computers have finite input spaces.

Context: FOOM, Friendliness, optimisation processes.

Consider a computer program which modifies itself in an attempt to optimise for speed. A modification to some algorithm is *proper* if it results, for all inputs, in the same output; it is an optimisation if it results in a shorter running time on average for typical inputs, and a *strict* optimisation if it results in a shorter running time for all inputs.

A Friendly AI, optimising itself, must ensure that it remains Friendly after the modification; it follows that it can only make proper modifications. (When calculating a CEV it may make improper modifications, since the final answer for “How do we deal with X” may change in the course of extrapolating; but for plain optimisations the answer cannot change.)

For simplicity we may consider that the output of a function can be expressed as a single bit; the extension to many bits is obvious. However, in addition to ‘0’ and ‘1’ we must consider that the response to some input can be “does not terminate”. The task is to prove that two functions, which we may consider as Turing machines, have the same output for all inputs.

Now, suppose you have a Turing machine that takes as input two arbitrary Turing machines and their respective tapes, and outputs “1” if the two input machines have the same output, and “0″ otherwise. Then, by having one of the inputs be a Turing machine which is known not to terminate—one that executes an infinite loop—you can solve the Halting Problem. Therefore, such a machine cannot exist: You cannot build a Turing machine to prove, for arbitrary input machines, that they have the same output.

It seems to follow that you cannot build a fully general proper-optimisation detector.

However, “arbitrary Turing machines” is a strong claim, in fact stronger than we require. No physically realisable computer is a true Turing machine, because it cannot have infinite storage space, as the definition requires. The problem is actually the slightly easier (that is, not *provably* impossible) one of making a proper-optimisation detector for the space of possible inputs to an actual computer, which is finite though very large. In practice we may limit the input space still further by considering, say, optimisations to functions whose input is two 64-bit numbers, or something. Even so, the brute-force solution of running the functions on all possible inputs and comparing is already rather impractical.