Complexity and Intelligence
One of the Godel-inspired challenges to the idea of self-improving minds is based on the notion of “complexity”.
Now “complexity”, as I’ve previously mentioned, is a dangerous sort of word. “Complexity” conjures up the image of a huge machine with incomprehensibly many gears inside—an impressive sort of image. Thanks to this impressiveness, “complexity” sounds like it could be explaining all sorts of things—that all sorts of phenomena could be happening because of “complexity”.
It so happens that “complexity” also names another meaning, strict and mathematical: the Kolmogorov complexity of a pattern is the size of the program code of the shortest Turing machine that produces the pattern as an output, given unlimited tape as working memory.
I immediately note that this mathematical meaning, is not the same as that intuitive image that comes to mind when you say “complexity”. The vast impressive-looking collection of wheels and gears? That’s not what the math term means.
Suppose you ran a Turing machine with unlimited tape, so that, starting from our laws of physics, it simulated our whole universe—not just the region of space we see around us, but all regions of space and all quantum branches. (There’s strong indications our universe may be effectively discrete, but if not, just calculate it out to 3^^^3 digits of precision.)
Then the “Kolmogorov complexity” of that entire universe—throughout all of space and all of time, from the Big Bang to whatever end, and all the life forms that ever evolved on Earth and all the decoherent branches of Earth and all the life-bearing planets anywhere, and all the intelligences that ever devised galactic civilizations, and all the art and all the technology and every machine ever built by those civilizations...
...would be 500 bits, or whatever the size of the true laws of physics when written out as equations on a sheet of paper.
The Kolmogorov complexity of just a single planet, like Earth, would of course be much higher than the “complexity” of the entire universe that contains it.
“Eh?” you say. “What’s this?” you say. “How can a single planet contain more wheels and gears, more complexity, than the whole vast turning universe that embeds it? Wouldn’t one planet contain fewer books, fewer brains, fewer species?”
But the new meaning that certain mathematicians have formulated and attached to the English word “complexity”, is not like the visually impressive complexity of a confusing system of wheels and gears.
It is, rather, the size of the smallest Turing machine that unfolds into a certain pattern.
If you want to print out the entire universe from the beginning of time to the end, you only need to specify the laws of physics.
If you want to print out just Earth by itself, then it’s not enough to specify the laws of physics. You also have to point to just Earth within the universe. You have to narrow it down somehow. And, in a big universe, that’s going to take a lot of information. It’s like the difference between giving directions to a city, and giving directions for finding a single grain of sand on one beach of one city. Indeed, with all those quantum branches existing side-by-side, it’s going to take around as much information to find Earth in the universe, as to just directly specify the positions of all the atoms.
Kolmogorov complexity is the sense at which we zoom into the endless swirls of the Mandelbrot fractal and think, not “How complicated”, but rather, “How simple”, because the Mandelbrot set is defined using a very simple equation. But if you wanted to find a subset of the Mandelbrot set, one particular swirl, you would have to specify where to look, and that would take more bits.
That’s why we use the Kolmogorov complexity in Occam’s Razor to determine how “complicated” or “simple” something is. So that, when we think of the entire universe, all the stars we can see and all the implied stars we can’t see, and hypothesize laws of physics standing behind it, we will think “what a simple universe” and not “what a complicated universe”—just like looking into the Mandelbrot fractal and thinking “how simple”. We could never accept a theory of physics as probably true, or even remotely possible, if it got an Occam penalty the size of the universe.
As a logical consequence of the way that Kolmogorov complexity has been defined, no closed system can ever increase in Kolmogorov complexity. (At least, no closed system without a ‘true randomness’ source.) A program can pattern ever more interacting wheels and gears into its RAM, but nothing it does from within itself can increase “the size of the smallest computer program that unfolds into it”, by definition.
Suppose, for example, that you had a computer program that defined a synthetic biology and a synthetic gene system. And this computer program ran through an artificial evolution that simulated 10^44 organisms (which is one estimate for the number of creatures who have ever lived on Earth), and subjected them to some kind of competition. And finally, after some predefined number of generations, it selected the highest-scoring organism, and printed it out. In an intuitive sense, you would (expect to) say that the best organisms on each round were getting more complicated, their biology more intricate, as the artificial biosystem evolved. But from the standpoint of Kolmogorov complexity, the final organism, even after a trillion years, is no more “complex” than the program code it takes to specify the tournament and the criterion of competition. The organism that wins is implicit in the specification of the tournament, so it can’t be any more “complex” than that.
But if, on the other hand, you reached into the biology and made a few million random changes here and there, the Kolmogorov complexity of the whole system would shoot way up: anyone wanting to specify it exactly, would have to describe the random changes that you made.
I specify “random” changes, because if you made the changes with beneficence aforethought—according to some criterion of goodness—then I could just talk about the compact criterion you used to make the changes. Only random information is incompressible on average, so you have to make purposeless changes if you want to increase the Kolmogorov complexity as fast as possible.
So! As you’ve probably guessed, the argument against self-improvement is that since closed systems cannot increase their “complexity”, the AI must look out upon the world, demanding a high-bandwidth sensory feed, in order to grow up.
If, that is, you believe that “increasing Kolmogorov complexity” is prerequisite to increasing real-world effectiveness.
(We will dispense with the idea that if system A builds system B, then system A must be “by definition” as smart as system B. This is the “Einstein’s mother must have been one heck of a physicist” sophistry. Even if a future planetary ecology is in some sense “implicit” in a single self-replicating RNA strand in some tidal pool, the ecology is a lot more impressive in a real-world sense: in a given period of time it can exert larger optimization pressures and do more neat stuff.)
Now, how does one argue that “increasing Kolmogorov complexity” has something to do with increasing intelligence? Especially when small machines can unfold into whole universes, and the maximum Kolmogorov complexity is realized by random noise?
One of the other things that a closed computable system provably can’t do, is solve the general halting problem—the problem of telling whether any Turing machine halts.
A similar proof shows that, if you take some given solver, and consider the maximum size bound such that the solver can solve the halting problem for all machines of that size or less, then this omniscience is bounded by at most the solver’s own complexity plus a small constant.
So… isn’t increasing your Kolmogorov complexity through outside sensory inputs, the key to learning to solve the halting problem for ever-larger systems?
And doesn’t this show that no closed system can “self-improve”?
In a word, no.
I mean, if you were to try to write it out as logic, you’d find that one of the steps involves saying, “If you can solve all systems of complexity N, you must be of complexity greater than N (maybe minus a small constant, depending on the details of the proof). Therefore, by increasing your complexity, you increase the range of systems you can solve.” This is formally a non-sequitur.
It’s also a non-sequitur in practice.
I mean, sure, if we’re not dealing with a closed system, you can’t prove that it won’t solve the halting problem. You could be looking at an external bright light in the sky that flashes on or off to reveal the halting solution.
But unless you already have that kind of mathematical ability yourself, you won’t know just from looking at the light that it’s giving you true solutions to the halting problem. You must have just been constructed with faith in the light, and the light must just happen to work.
(And in any computable universe, any light in the sky that you see won’t happen to solve the halting problem.)
It’s not easy for “sensory information” to give you justified new mathematical knowledge that you could not in principle obtain with your eyes closed.
It’s not a matter, for example, of seeing written in the sky a brilliant proof, that you would never have thought of on your own. A closed system with infinite RAM can close its eyes, and write out every possible sensory experience it could have, along with its own reactions to them, that could occur within some bounded number of steps. Doing this does not increase its Kolmogorov complexity.
So the notion can’t be that the environment tells you something that you recognize as a proof, but didn’t think of on your own. Somehow, having that sensory experience in particular, has to increase your mathematical ability even after you perfectly imagined that experience and your own reaction to it in advance.
Could it be the healing powers of having a larger universe to live in, or other people to talk to? But you can simulate incredibly large universes—vastly larger than anything we see in our telescopes, up-arrow large—within a closed system without increasing its Kolmogorov complexity. Within that simulation you could peek at people watching the stars, and peek at people interacting with each other, and plagiarize the books they wrote about the deep wisdom that comes from being embedded in a larger world.
What justified novel mathematical knowledge—about the halting problem in particular—could you gain from a sensory experience, that you could not gain from perfectly imagining that sensory experience and your own reaction to it, nor gain from peeking in on a simulated universe that included someone having that sensory experience?
Well, you can always suppose that you were born trusting the light in the sky, and the light in the sky always happens to tell the truth.
But what’s actually going on is that the non-sequitur is coming back to bite: Increasing your Kolmogorov complexity doesn’t necessarily increase your ability to solve math problems. Swallowing a bucket of random bits will increase your Kolmogorov complexity too.
You aren’t likely to learn any mathematics by gazing up at the external stars, that you couldn’t learn from “navel-gazing” into an equally large simulated universe. Looking at the actual stars around you is good for figuring out where in the universe you are (the extra information that specifies your location) but not much good for learning new math that you couldn’t learn by navel-gazing into a simulation as large as our universe.
In fact, it’s just bloody hard to fundamentally increase your ability to solve math problems in a way that “no closed system can do” just by opening the system. So far as I can tell, it basically requires that the environment be magic and that you be born with faith in this fact.
Saying that a 500-state Turing machine might be able to solve all problems up to at most 500 states plus a small constant, is misleading. That’s an upper bound, not a lower bound, and it comes from having a constructive way to build a specific unsolvable Turing machine out of the solver. In reality, you’d expect a 500-state Turing machine to get nowhere near solving the halting problem up to 500. I would drop dead of shock if there were a 500-state Turing machine that solved the halting problem for all the Turing machines up to 50 states. The vast majority of 500-state Turing machines that implement something that looks like a “halting problem solver” will go nowhere near 500 states (but see this comment).
Suppose you write a relatively short Turing machine that, by virtue of its unlimited working memory, creates an entire universe containing googols or up-arrows of atoms...
...and within this universe, life emerges on some planet-equivalent, and evolves, and develops intelligence, and devises science to study its ecosphere and its universe, and builds computers, and goes out into space and investigates the various physical systems that have formed, and perhaps encounters other life-forms...
...and over the course of trillions or up-arrows of years, a transgalactic intrauniversal economy develops, with conduits conducting information from one end of the universe to another (because you didn’t pick a physics with inconvenient lightspeed limits), a superWeb of hyperintelligences all exchanging information...
...and finally—after a long, long time—your computer program blinks a giant message across the universe, containing a problem to be solved and a specification of how to answer back, and threatening to destroy their universe if the answer is wrong...
...then this intrauniversal civilization—and everything it’s ever learned by theory or experiment over the last up-arrow years—is said to contain 400 bits of complexity, or however long the original program was.
But where did it get its math powers, from inside the closed system?
If we trace back the origins of the hypergalactic civilization, then every belief it ever adopted about math, came from updating on some observed event. That might be a star exploding, or it might be the output of a calculator, or it might be an observed event within some mind’s brain… but in all cases, the update will occur because of a logic that says, “If I see this, it means that”. Before you can learn, you must have the prior that structures learning. If you see something that makes you decide to change the way you learn, then you must believe that seeing X implies you should learn a different way Y. That’s how it would be for superintelligences, I expect.
If you keep tracing back through that simulated universe, you arrive at something before the dawn of superintelligence—the first intelligent beings, produced by evolution. These first minds are the ones who’ll look at Peano Arithmetic and think, “This has never produced a contradiction, so it probably never will—I’ll go ahead and program that into my AI.” These first minds are the ones who’ll think, “Induction seems like it works pretty well, but how do I formalize the notion of induction?” And these first minds are the ones who’ll think, “If I build a self-improving AI, how should it update itself—including changing its own updating process—from the results of observation?”
And how did the first minds get the ability to think those thoughts? From natural selection, that generated the adaptations that executed to think all those thoughts, using the simple evolutionary rule: “keep what works”.
And in turn, natural selection in this universe popped out of the laws of physics.
So everything that this hypergalactic civilization ever believes about math, is really just induction in one form or another. All the evolved adaptations that do induction, produced by inductive natural selection; and all the generalizations made from experience, including generalizations about how to form better generalizations. It would all just unfold out of the inductive principle...
...running in a box sealed as tightly shut as our own universe appears to be.
And I don’t see how we, in our own closed universe, are going to do any better. Even if we have the ability to look up at the stars, it’s not going to give us the ability to go outside that inductive chain to obtain justified mathematical beliefs.
If you wrote that 400-bit simulated universe over the course of a couple of months using human-level intelligence and some mysterious source of unlimited computing power, then you are much more complex than that hypergalactic civilization. You take much more than 400 bits to find within the space of possibilities, because you are only one particular brain.
But y’know, I think that your mind, and the up-arrow mind of that inconceivable civilization, would still be worth distinguishing as Powers. Even if you can figure out how to ask them questions. And even if you’re asking them questions by running an internal simulation, which makes it all part of your own “complexity” as defined in the math.
To locate a up-arrow-sized mind within an up-arrow-sized civilization, would require up-arrow bits—even if the entire civilization unfolded out of a 400-bit machine as compact as our own laws of physics. But which would be more powerful, that one “complex” mind, or the “simple” civilization it was part of?
None of this violates Godelian limitations. You can transmit to the hypergalactic civilization a similar Turing machine to the one that built it, and ask it how that Turing machine behaves. If you can fold a hypergalactic civilization into a 400-bit Turing machine, then even a hypergalactic civilization can confront questions about the behavior of 400-bit Turing machines that are real stumpers.
And 400 bits is going to be an overestimate. I bet there’s at least one up-arrow-sized hypergalactic civilization folded into a halting Turing machine with 15 states, or something like that. If that seems unreasonable, you are not acquainted with the Busy-Beaver problem.
You can get a hell of a lot of mathematical ability out of small Turing machines that unfold into pangalactic hypercivilizations. But unfortunately, there are other small Turing machines that are hellishly difficult problems—perhaps unfolding into hypercivilizations themselves, or things even less comprehensible. So even the tremendous mathematical minds that can unfold out of small Turing machines, won’t be able to solve all Turing machines up to a larger size bound. Hence no Godelian contradiction.
(I wonder: If humanity unfolded into a future civilization of infinite space and infinite time, creating descendants and hyperdescendants of unlimitedly growing size, what would be the largest Busy Beaver number ever agreed upon? 15? Maybe even as large as 20? Surely not 100 - you could encode a civilization of similar origins and equivalent power into a smaller Turing machine than that.)
Olie Lamb said: “I don’t see anything good about complexity. There’s nothing artful about complexity. There’s nothing mystical about complexity. It’s just complex.” This is true even when you’re talking about wheels and gears, never mind Kolmogorov complexity. It’s simplicity, not complexity, that is the admirable virtue.
The real force behind this whole debate is that the word “complexity” sounds impressive and can act as an explanation for anything you don’t understand. Then the word gets captured by a mathematical property that’s spelled using the same letters, which happens to be provably constant for closed systems.
That, I think, is where the argument really comes from, as it rationalizes the even more primitive intuition of some blind helpless thing in a box.
This argument is promulgated even by some people who can write proofs about complexity—but frankly, I do not think they have picked up the habit of visualizing what their math symbols mean in real life. That thing Richard Feynman did, where he visualized two balls turning colors or growing hairs? I don’t think they do that. On the last step of interpretation, they just make a quick appeal to the sound of the words.
But I will agree that, if the laws we know are true, then a self-improving mind which lacks sensory inputs, shouldn’t be able to improve its mathematical abilities beyond the level of a up-arrow-sized civilization—for example, it shouldn’t be able to solve Busy-Beaver(100).
It might perhaps be more limited than this in mere practice, if it’s just running on a laptop computer or something. But if theoretical mathematical arguments about closed systems show anything, that is what they show.