Against Instrumental Convergence
Instrumental convergence is the idea that every sufficiently intelligent agent would exhibit behaviors such as self preservation or acquiring resources. This is a natural result for maximizers of simple utility functions. However I claim that it is based on a false premise that agents must have goals.
What are goals?
For agents constructed as utility maximizers, the goal coincides with utility maximization. There is no doubt that for most utility functions, a utility maximizer would exhibit instrumental convergence. However, I claim that most minds in the mindspace are not utility maximizers in the usual sense.
It may be true that for every agent there is a utility function that it maximizes, in the spirit of VNM utility. However these utility functions do not coincide with goals in the sense that instrumental convergence requires. These functions are merely encodings of the agent’s decision algorithm and are no less complex than the agent itself. No simple conclusions can be made from their existence.
Humans exhibit goals in the usual sense and arguably have VNM utility functions. Human goals often involve maximizing some quantity, e.g. money. However explicit human goals represent only a small fraction of their total utility computation. Presumably, the goals explain the extent to which some humans exhibit instrumental convergence, and the rest of the utility function explains why we haven’t yet tiled the universe with money.
What about non-human non-utility-maximizer agents? Certainly, some of them can still exhibit instrumental convergence, but I will argue that this is rare.
What is the average mind in the mindspace?
The “mindspace” refers to some hypothetical set of functions or algorithms, possibly selected to meet some arbitrary definition of intelligence. I claim that most minds in the mindspace do not share any properties that we have not selected for. In fact, most minds in the mindspace do nothing even remotely useful. Even if we explicitly selected a random mind from a useful subset of the mindspace, the mind would most likely do nothing more than the bare minimum we required. For example, if we search for minds that are able to run a paperclip factory, we will find minds that run paperclip factories well enough to pass our test, but not any better. Intelligence is defined by the ability to solve problems, not by the degree of agency.
Without a doubt, somewhere in the mindspace there is a mind that will run the paperclip factory, acquire resources, and eventually tile the universe with paperclips, however it is not the only mind out there. There is also a mind that runs the paperclip factory and then when it has nothing better to do, shuts down, sits in an empty loop, dreams of butter, or generates bad harry potter fanfic.
With this in mind, random searches in the mindspace are relatively safe, even if the minds we find aren’t well aligned. Though it would be lovely to be certain that a new mind is not the “tile the universe with paperclips” kind.
Caveats and half-baked ideas
This post comes from trying to understand why I don’t find the threat of AI as inevitable as some suggest. In other words, it’s a rationalization.
I have a limited understanding of what MIRI does and what assumptions it has. I’m under the impression that they are primarily working on utility maximizers, and that instrumental convergence is important to them. But it’s likely that points similar to mine have been made and either accepted or countered.
In the post I make empirical claims about the composition of the mindspace, I have obviously not verified them, if they are even verifiable. The claims seem trivial to me, but may well not be that strong.
While simple utility maximizing minds are not common, it’s possible that they are the smallest minds that can pass our tests, or that they have other special properties that would make semi-random searches find them more often than we’d like.