Design, Implement and Verify

Taking a look at the latest here after a hiatus, I notice there is once again a lot of discussion about the problem of AI safety, clearly a cause for concern to people who believe it to be an existential threat.

I personally think AI safety is not an existential threat, not because I believe the AI alignment problem is easier than Eliezer et al do, but because I believe AGI is much harder. I was involved in some debate on that question, a while back, but neither side was able to convince the other. I now think that’s because it’s unprovable; given the same data, the answer relies too heavily on intuition. Instead, I look for actions that are worth taking regardless of whether AGI is easy or hard.

One thing I do consider a matter for grave concern is the call to address the issue by shutting down AI research, and progress on computing in general. Of course there are short-term problems with this course of action, such as that it is, if implemented, much more likely to be enforced in democracies than dictatorships, which is very much not an outcome we should want.

The long-term problem with shutting down progress is that at very best, it just exchanges one form of suicide for another. Death is the default. Without progress, we remain trapped in a sealed box, wallowing in our own filth until something more mundane like nuclear war or pandemic, puts an end to our civilization. Once that happens, it’s game over. Even if our species survives the immediate disaster, all the easily accessible fossil fuel deposits are gone. There will be no new Renaissance and Industrial Revolution. We’ll be back to banging the rocks together until evolution finds a way to get rid of the overhead of general intelligence, then the sun autoclaves what’s left of the biosphere and the unobserved stars spend the next ten trillion years burning down to cinders.

(Unless AI alignment is so easy that humans can figure it out by pure armchair thought, in the absence of actually trying to develop AI. But for that to be the case, it would have to be much easier than many technical problems we’ve already solved. And if AI alignment were that easy, there would be no call for concern in the first place.)

It is, admittedly, not as though there is no ground for pessimism. The problem of AI alignment, as conceived by default, is impossible. That’s certainly ground for pessimism!

The default way to think about it is straightforward. Friendliness is a predicate, a quality that an AI has or lacks. A function from a system to a Boolean. (The output could be more complex; it doesn’t change the conclusion.) The input is an AI; the output is a Boolean.

The problem – or, let’s say, far from the least of the problems – with this formulation is that the function Friendly(AI) is undecidable. Proof: straightforward application of Rice’s theorem.

On the face of it, this proves too much; Rice’s theorem would seem to preclude writing any useful software. The trick is, of course, that we don’t start with an arbitrary program and try to prove it does what we want. We develop the software along with the understanding of why it does what we want and not what we don’t want, and preferably along with mechanical verification of some relevant properties like absence of various kinds of stray pointer errors. In other words, the design, implementation and verification all develop together.

This is not news to anyone who has worked in the software industry. The point is that – if and to the extent it exists at all – AI is software, and is subject to the same rules as any other software project: if you want something that reliably does what you want, design, implementation and verification need to go together. Put that way, it sounds obvious, but it’s easy to miss the implications.

It means there is no point trying to create a full-blown AGI by running an opaque optimization process – a single really big neural network, say, or a genetic algorithm with a very large population size – on a lot of hardware, and hoping something amazing jumps out. If I’m right about the actual difficulty of AGI, nothing amazing will happen, and if Eliezer et al are right and brute-force AGI is relatively easy, the result won’t have the safety properties you want. (That doesn’t mean there aren’t valuable use cases for running a neural network on a lot of hardware. It does mean ‘if only we can throw enough hardware at this, maybe it will wake up and become conscious’ is not one of them.)

It means there is no point trying to figure out how to verify an arbitrarily complex, opaque blob after the fact. You can’t. Verification has to go in tandem with design and implementation. For example, from https://​​www.alignmentforum.org/​​posts/​​QEYWkRoCn4fZxXQAY/​​prizes-for-elk-proposals

We suspect you can’t solve ELK just by getting better data—you probably need to “open up the black box” and include some term in the loss that depends on the structure of your model and not merely its behavior.

Yes. Indeed, this is still an understatement; ‘open up the black box’ is easy to interpret as meaning that you start off by being given a black box, and then begin to think about how to open it up. A better way to look at it is that you need to be thinking about how to figure out what’s going on in the box, in tandem with building the box in the first place.

It means there is no point trying to solve the alignment problem by pure armchair thought. That would be like expecting Babbage and Lovelace to deduce Meltdown/​Spectre. It’s not going to happen in the absence of actually designing and building systems.

It means suppressing development and shutting down progress is suicide. Death and extinction are the default outcomes. Whatever chance we have of turning our future light cone into a place where joy and wonder exist, depends on making continued progress – quickly, before the window of opportunity slams shut.

‘Design, implement and verify’ sounds more difficult than just trying to do one of these things in isolation, but that’s an illusion. All three activities are necessary parts of the job, each depends on the others, and none will be successfully accomplished in isolation.