Permitted Possibilities, & Locality

Continuation of: Hard Takeoff

The analysis given in the last two days permits more than one possible AI trajectory:

  1. Programmers, smarter than evolution at finding tricks that work, but operating without fundamental insight or with only partial insight, create a mind that is dumber than the researchers but performs lower-quality operations much faster. This mind reaches k > 1, cascades up to the level of a very smart human, itself achieves insight into intelligence, and undergoes the really fast part of the FOOM, to superintelligence. This would be the major nightmare scenario for the origin of an unFriendly AI.

  2. Programmers operating with partial insight, create a mind that performs a number of tasks very well, but can’t really handle self-modification let alone AI theory. A mind like this might progress with something like smoothness, pushed along by the researchers rather than itself, even all the way up to average-human capability—not having the insight into its own workings to push itself any further. We also suppose that the mind is either already using huge amounts of available hardware, or scales very poorly, so it cannot go FOOM just as a result of adding a hundred times as much hardware. This scenario seems less likely to my eyes, but it is not ruled out by any effect I can see.

  3. Programmers operating with strong insight into intelligence, directly create along an efficient and planned pathway, a mind capable of modifying itself with deterministic precision—provably correct or provably noncatastrophic self-modifications. This is the only way I can see to achieve narrow enough targeting to create a Friendly AI. The “natural” trajectory of such an agent would be slowed by the requirements of precision, and sped up by the presence of insight; but because this is a Friendly AI, notions like “You can’t yet improve yourself this far, your goal system isn’t verified enough” would play a role.

So these are some things that I think are permitted to happen, albeit that case 2 would count as a hit against me to some degree because it does seem unlikely.

Here are some things that shouldn’t happen, on my analysis:

  • An ad-hoc self-modifying AI as in (1) undergoes a cycle of self-improvement, starting from stupidity, that carries it up to the level of a very smart human—and then stops, unable to progress any further. (The upward slope in this region is supposed to be very steep!)

  • A mostly non-self-modifying AI as in (2) is pushed by its programmers up to a roughly human level… then to the level of a very smart human… then to the level of a mild transhuman… but the mind still does not achieve insight into its own workings and still does not undergo an intelligence explosion—just continues to increase smoothly in intelligence from there.

And I also don’t think this is allowed:

  • The “scenario that Robin Hanson seems to think is the line-of-maximum-probability for AI as heard and summarized by Eliezer Yudkowsky”:

    • No one AI that does everything humans do, but rather a large, diverse population of AIs. These AIs have various domain-specific competencies that are “human+ level”—not just in the sense of Deep Blue beating Kasparov, but in the sense that in these domains, the AIs seem to have good “common sense” and can e.g. recognize, comprehend and handle situations that weren’t in their original programming. But only in the special domains for which that AI was crafted/​trained. Collectively, these AIs may be strictly more competent than any one human, but no individual AI is more competent than any one human.

    • Knowledge and even skills are widely traded in this economy of AI systems.

    • In concert, these AIs, and their human owners, and the economy that surrounds them, undergo a collective FOOM of self-improvement. No local agent is capable of doing all this work, only the collective system.

    • The FOOM’s benefits are distributed through a whole global economy of trade partners and suppliers, including existing humans and corporations, though existing humans and corporations may form an increasingly small fraction of the New Economy.

    • This FOOM looks like an exponential curve of compound interest, like the modern world but with a substantially shorter doubling time.

Mostly, Robin seems to think that uploads will come first, but that’s a whole ’nother story. So far as AI goes, this looks like Robin’s maximum line of probability—and if I got this mostly wrong or all wrong, that’s no surprise. Robin Hanson did the same to me when summarizing what he thought were my own positions. I have never thought, in prosecuting this Disagreement, that we were starting out with a mostly good understanding of what the Other was thinking; and this seems like an important thing to have always in mind.

So—bearing in mind that I may well be criticizing a straw misrepresentation, and that I know this full well, but I am just trying to guess my best—here’s what I see as wrong with the elements of this scenario:

• The abilities we call “human” are the final products of an economy of mind—not in the sense that there are selfish agents in it, but in the sense that there are production lines; and I would even expect evolution to enforce something approaching fitness as a common unit of currency. (Enough selection pressure to create an adaptation from scratch should be enough to fine-tune the resource curves involved.) It’s the production lines, though, that are the main point—that your brain has specialized parts and the specialized parts pass information around. All of this goes on behind the scenes, but it’s what finally adds up to any single human ability.

In other words, trying to get humanlike performance in just one domain, is divorcing a final product of that economy from all the work that stands behind it. It’s like having a global economy that can only manufacture toasters, but not dishwashers or light bulbs. You can have something like Deep Blue that beats humans at chess in an inhuman, specialized way; but I don’t think it would be easy to get humanish performance at, say, biology R&D, without a whole mind and architecture standing behind it, that would also be able to accomplish other things. Tasks that draw on our cross-domain-ness, or our long-range real-world strategizing, or our ability to formulate new hypotheses, or our ability to use very high-level abstractions—I don’t think that you would be able to replace a human in just that one job, without also having something that would be able to learn many different jobs.

I think it is a fair analogy to the idea that you shouldn’t see a global economy that can manufacture toasters but not manufacture anything else.

This is why I don’t think we’ll see a system of AIs that are diverse, individually highly specialized, and only collectively able to do anything a human can do.

• Trading cognitive content around between diverse AIs is more difficult and less likely than it might sound. Consider the field of AI as it works today. Is there any standard database of cognitive content that you buy off the shelf and plug into your amazing new system, whether it be a chessplayer or a new data-mining algorithm? If it’s a chess-playing program, there are databases of stored games—but that’s not the same as having databases of preprocessed cognitive content.

So far as I can tell, the diversity of cognitive architectures acts as a tremendous barrier to trading around cognitive content. If you have many AIs around that are all built on the same architecture by the same programmers, they might, with a fair amount of work, be able to pass around learned cognitive content. Even this is less trivial than it sounds. If two AIs both see an apple for the first time, and they both independently form concepts about that apple, and they both independently build some new cognitive content around those concepts, then their thoughts are effectively written in a different language. By seeing a single apple at the same time, they could identify a concept they both have in mind, and in this way build up a common language...

...the point being that even when two separated minds are running literally the same source code, it is still difficult for them to trade new knowledge as raw cognitive content without having a special language designed just for sharing knowledge.

Now suppose the two AIs are built around different architectures.

The barrier this opposes to a true, cross-agent, literal “economy of mind”, is so strong, that in the vast majority of AI applications you set out to write today, you will not bother to import any standardized preprocessed cognitive content. It will be easier for your AI application to start with some standard examples—databases of that sort of thing do exist, in some fields anyway—and redo all the cognitive work of learning on its own.

That’s how things stand today.

And I have to say that looking over the diversity of architectures proposed at any AGI conference I’ve attended, it is very hard to imagine directly trading cognitive content between any two of them. It would be an immense amount of work just to set up a language in which they could communicate what they take to be facts about the world—never mind preprocessed cognitive content.

This is a force for localization: unless the condition I have just described changes drastically, it means that agents will be able to do their own cognitive labor, rather than needing to get their brain content manufactured elsewhere, or even being able to get their brain content manufactured elsewhere. I can imagine there being an exception to this for non-diverse agents that are deliberately designed to carry out this kind of trading within their code-clade. (And in the long run, difficulties of translation seems less likely to stop superintelligences.)

But in today’s world, it seems to be the rule that when you write a new AI program, you can sometimes get preprocessed raw data, but you will not buy any preprocessed cognitive content—the internal content of your program will come from within your program.

And it actually does seem to me that AI would have to get very sophisticated before it got over the “hump” of increased sophistication making sharing harder instead of easier. I’m not sure this is pre-takeoff sophistication we’re talking about, here. And the cheaper computing power is, the easier it is to just share the data and do the learning on your own.

Again—in today’s world, sharing of cognitive content between diverse AIs doesn’t happen, even though there are lots of machine learning algorithms out there doing various jobs. You could say things would happen differently in the future, but it’d be up to you to make that case.

• Understanding the difficulty of interfacing diverse AIs, is the next step toward understanding why it’s likely to be a single coherent cognitive system that goes FOOM via recursive self-improvement. The same sort of barriers that apply to trading direct cognitive content, would also apply to trading changes in cognitive source code.

It’s a whole lot easier to modify the source code in the interior of your own mind, than to take that modification, and sell it to a friend who happens to be written on different source code.

Certain kinds of abstract insights would be more tradeable, among sufficiently sophisticated minds; and the major insights might be well worth selling—like, if you invented a new general algorithm at some subtask that many minds perform. But if you again look at the modern state of the field, then you find that it is only a few algorithms that get any sort of general uptake.

And if you hypothesize minds that understand these algorithms, and the improvements to them, and what these algorithms are for, and how to implement and engineer them—then these are already very sophisticated minds, at this point, they are AIs that can do their own AI theory. So the hard takeoff has to have not already started, yet, at this point where there are many AIs around that can do AI theory. If they can’t do AI theory, diverse AIs are likely to experience great difficulties trading code improvements among themselves.

This is another localizing force. It means that the improvements you make to yourself, and the compound interest earned on those improvements, is likely to stay local.

If the scenario with an AI takeoff is anything at all like the modern world in which all the attempted AGI projects have completely incommensurable architectures, then any self-improvements will definitely stay put, not spread.

• But suppose that the situation did change drastically from today, and that you had a community of diverse AIs which were sophisticated enough to share cognitive content, code changes, and even insights. And suppose even that this is true at the start of the FOOM—that is, the community of diverse AIs got all the way up to that level, without yet using a FOOM or starting a FOOM at a time when it would still be localized.

We can even suppose that most of the code improvements, algorithmic insights, and cognitive content driving any particular AI, is coming from outside that AI—sold or shared—so that the improvements the AI makes to itself, do not dominate its total velocity.

Fine. The humans are not out of the woods.

Even if we’re talking about uploads, it will be immensely more difficult to apply any of the algorithmic insights that are tradeable between AIs, to the undocumented human brain, that is a huge mass of spaghetti code, that was never designed to be upgraded, that is not end-user-modifiable, that is not hot-swappable, that is written for a completely different architecture than what runs efficiently on modern processors...

And biological humans? Their neurons just go on doing whatever neurons do, at 100 cycles per second (tops).

So this FOOM that follows from recursive self-improvement, the cascade effect of using your increased intelligence to rewrite your code and make yourself even smarter -

The barriers to sharing cognitive improvements among diversely designed AIs, are large; the barriers to sharing with uploaded humans, are incredibly huge; the barrier to sharing with biological humans, is essentially absolute. (Barring a (benevolent) superintelligence with nanotechnology, but if one of those is around, you have already won.)

In this hypothetical global economy of mind, the humans are like a country that no one can invest in, that cannot adopt any of the new technologies coming down the line.

I once observed that Ricardo’s Law of Comparative Advantage is the theorem that unemployment should not exist. The gotcha being that if someone is sufficiently unreliable, there is a cost to you to train them, a cost to stand over their shoulders and monitor them, a cost to check their results for accuracy—the existence of unemployment in our world is a combination of transaction costs like taxes, regulatory barriers like minimum wage, and above all, lack of trust. There are a dozen things I would pay someone else to do for me—if I wasn’t paying taxes on the transaction, and if I could trust a stranger as much as I trust myself (both in terms of their honesty and of acceptable quality of output). Heck, I’d as soon have some formerly unemployed person walk in and spoon food into my mouth while I kept on typing at the computer—if there were no transaction costs, and I trusted them.

If high-quality thought drops into a speed closer to computer time by a few orders of magnitude, no one is going to take a subjective year to explain to a biological human an idea that they will be barely able to grasp, in exchange for an even slower guess at an answer that is probably going to be wrong anyway.

Even uploads could easily end up doomed by this effect, not just because of the immense overhead cost and slowdown of running their minds, but because of the continuing error-proneness of the human architecture. Who’s going to trust a giant messy undocumented neural network, any more than you’d run right out and hire some unemployed guy off the street to come into your house and do your cooking?

This FOOM leaves humans behind -

- unless you go the route of Friendly AI, and make a superintelligence that simply wants to help humans, not for any economic value that humans provide to it, but because that is its nature.

And just to be clear on something—which really should be clear by now, from all my other writing, but maybe you’re just wandering in—it’s not that having squishy things running around on two legs is the ultimate height of existence. But if you roll up a random AI with a random utility function, it just ends up turning the universe into patterns we would not find very eudaimonic—turning the galaxies into paperclips. If you try a haphazard attempt at making a “nice” AI, the sort of not-even-half-baked theories I see people coming up with on the spot and occasionally writing whole books about, like using reinforcement learning on pictures of smiling humans to train the AI to value happiness, yes this was a book, then the AI just transforms the galaxy into tiny molecular smileyfaces...

It’s not some small, mean desire to survive for myself at the price of greater possible futures, that motivates me. The thing is—those greater possible futures, they don’t happen automatically. There are stakes on the table that are so much an invisible background of your existence that it would never occur to you they could be lost; and these things will be shattered by default, if not specifically preserved.

• And as for the idea that the whole thing would happen slowly enough for humans to have plenty of time to react to things—a smooth exponential shifted into a shorter doubling time—of that, I spoke yesterday. Progress seems to be exponential now, more or less, or at least accelerating, and that’s with constant human brains. If you take a nonrecursive accelerating function and fold it in on itself, you are going to get superexponential progress. “If computing power doubles every eighteen months, what happens when computers are doing the research” should not just be a faster doubling time. (Though, that said, on any sufficiently short timescale, progress might well locally approximate an exponential because investments will shift in such fashion that the marginal returns on investment balance, even in the interior of a single mind; interest rates consistent over a timespan imply smooth exponential growth over that timespan.)

You can’t count on warning, or time to react. If an accident sends a sphere of plutonium, not critical, but prompt critical, neutron output can double in a tenth of a second even with k = 1.0006. It can deliver a killing dose of radiation or blow the top off a nuclear reactor before you have time to draw a breath. Computers, like neutrons, already run on a timescale much faster than human thinking. We are already past the world where we can definitely count on having time to react.

When you move into the transhuman realm, you also move into the realm of adult problems. To wield great power carries a price in great precision. You can build a nuclear reactor but you can’t ad-lib it. On the problems of this scale, if you want the universe to end up a worthwhile place, you can’t just throw things into the air and trust to luck and later correction. That might work in childhood, but not on adult problems where the price of one mistake can be instant death.

Making it into the future is an adult problem. That’s not a death sentence. I think. It’s not the inevitable end of the world. I hope. But if you want humankind to survive, and the future to be a worthwhile place, then this will take careful crafting of the first superintelligence—not just letting economics or whatever take its easy, natural course. The easy, natural course is fatal—not just to ourselves but to all our hopes.

That, itself, is natural. It is only to be expected. To hit a narrow target you must aim; to reach a good destination you must steer; to win, you must make an extra-ordinary effort.