...Recursion, Magic

Eliezer Yudkowsky25 Nov 2008 9:10 UTC

27 points

AI AI Takeoff General Intelligence Recursive Self-Improvement

Followup to: Cascades, Cycles, Insight...

...4, 5 sources of discontinuity.

Recursion is probably the most difficult part of this topic. We have historical records aplenty of cascades, even if untangling the causality is difficult. Cycles of reinvestment are the heartbeat of the modern economy. An insight that makes a hard problem easy, is something that I hope you’ve experienced at least once in your life...

But we don’t have a whole lot of experience redesigning our own neural circuitry.

We have these wonderful things called “optimizing compilers”. A compiler translates programs in a high-level language, into machine code (though these days it’s often a virtual machine). An “optimizing compiler”, obviously, is one that improves the program as it goes.

So why not write an optimizing compiler in its own language, and then run it on itself? And then use the resulting optimized optimizing compiler, to recompile itself yet again, thus producing an even more optimized optimizing compiler -

Halt! Stop! Hold on just a minute! An optimizing compiler is not supposed to change the logic of a program—the input/output relations. An optimizing compiler is only supposed to produce code that does the same thing, only faster. A compiler isn’t remotely near understanding what the program is doing and why, so it can’t presume to construct a better input/output function. We just presume that the programmer wants a fixed input/output function computed as fast as possible, using as little memory as possible.

So if you run an optimizing compiler on its own source code, and then use the product to do the same again, it should produce the same output on both occasions—at most, the first-order product will run faster than the original compiler.

If we want a computer program that experiences cascades of self-improvement, the path of the optimizing compiler does not lead there—the “improvements” that the optimizing compiler makes upon itself, do not improve its ability to improve itself.

Now if you are one of those annoying nitpicky types, like me, you will notice a flaw in this logic: suppose you built an optimizing compiler that searched over a sufficiently wide range of possible optimizations, that it did not ordinarily have time to do a full search of its own space—so that, when the optimizing compiler ran out of time, it would just implement whatever speedups it had already discovered. Then the optimized optimizing compiler, although it would only implement the same logic faster, would do more optimizations in the same time—and so the second output would not equal the first output.

Well… that probably doesn’t buy you much. Let’s say the optimized program is 20% faster, that is, it gets 20% more done in the same time. Then, unrealistically assuming “optimization” is linear, the 2-optimized program will be 24% faster, the 3-optimized program will be 24.8% faster, and so on until we top out at a 25% improvement. k < 1.

So let us turn aside from optimizing compilers, and consider a more interesting artifact, EURISKO.

To the best of my inexhaustive knowledge, EURISKO may still be the most sophisticated self-improving AI ever built—in the 1980s, by Douglas Lenat before he started wasting his life on Cyc. EURISKO was applied in domains ranging from the Traveller war game (EURISKO became champion without having ever before fought a human) to VLSI circuit design.

EURISKO used “heuristics” to, for example, design potential space fleets. It also had heuristics for suggesting new heuristics, and metaheuristics could apply to any heuristic, including metaheuristics. E.g. EURISKO started with the heuristic “investigate extreme cases” but moved on to “investigate cases close to extremes”. The heuristics were written in RLL, which stands for Representation Language Language. According to Lenat, it was figuring out how to represent the heuristics in such fashion that they could usefully modify themselves without always just breaking, that consumed most of the conceptual effort in creating EURISKO.

But EURISKO did not go foom.

EURISKO could modify even the metaheuristics that modified heuristics. EURISKO was, in an important sense, more recursive than either humans or natural selection—a new thing under the Sun, a cycle more closed than anything that had ever existed in this universe.

Still, EURISKO ran out of steam. Its self-improvements did not spark a sufficient number of new self-improvements. This should not really be too surprising—it’s not as if EURISKO started out with human-level intelligence plus the ability to modify itself—its self-modifications were either evolutionarily blind, or produced by the simple procedural rules of some heuristic or other. That’s not going to navigate the search space very fast on an atomic level. Lenat did not stand dutifully apart from his creation, but stepped in and helped EURISKO prune its own heuristics. But in the end EURISKO ran out of steam, and Lenat couldn’t push it any further.

EURISKO lacked what I called “insight”—that is, the type of abstract knowledge that lets humans fly through the search space. And so its recursive access to its own heuristics proved to be for nought.

Unless, y’know, you’re counting becoming world champion at Traveller without ever previously playing a human, as some sort of accomplishment.

But it is, thankfully, a little harder than that to destroy the world—as Lenat’s experimental test informed us.

Robin previously asked why Douglas Engelbart did not take over the world, despite his vision of a team building tools to improve tools, and his anticipation of tools like computer mice and hypertext.

One reply would be, “Sure, a computer gives you a 10% advantage in doing various sorts of problems, some of which include computers—but there’s still a lot of work that the computer doesn’t help you with—and the mouse doesn’t run off and write better mice entirely on its own—so k < 1, and it still takes large amounts of human labor to advance computer technology as a whole—plus a lot of the interesting knowledge is nonexcludable so it’s hard to capture the value you create—and that’s why Buffett could manifest a better take-over-the-world-with-sustained-higher-interest-rates than Engelbart.”

But imagine that Engelbart had built a computer mouse, and discovered that each click of the mouse raised his IQ by one point. Then, perhaps, we would have had a situation on our hands.

Maybe you could diagram it something like this:

Metacognitive level: Evolution is the metacognitive algorithm which produced the wiring patterns and low-level developmental rules for human brains.
Cognitive level: The brain processes its knowledge (including procedural knowledge) using algorithms that quite mysterious to the user within them. Trying to program AIs with the sort of instructions humans give each other usually proves not to do anything: the machinery activated by the levers is missing.
Metaknowledge level: Knowledge and skills associated with e.g. “science” as an activity to carry out using your brain—instructing you when to try to think of new hypotheses using your mysterious creative abilities.
Knowledge level: Knowing how gravity works, or how much weight steel can support.
Object level: Specific actual problems, like building a bridge or something.

This is a causal tree, and changes at levels closer to root have greater impacts as the effects cascade downward.

So one way of looking at it is: “A computer mouse isn’t recursive enough.”

This is an issue that I need to address at further length, but for today I’m out of time.

Magic is the final factor I’d like to point out, at least for now, in considering sources of discontinuity for self-improving minds. By “magic” I naturally do not refer to this. Rather, “magic” in the sense that if you asked 19th-century Victorians what they thought the future would bring, they would have talked about flying machines or gigantic engines, and a very few true visionaries would have suggested space travel or Babbage computers. Nanotechnology, not so much.

The future has a reputation for accomplishing feats which the past thought impossible. Future civilizations have even broken what past civilizations thought (incorrectly, of course) to be the laws of physics. If prophets of 1900 AD—never mind 1000 AD—had tried to bound the powers of human civilization a billion years later, some of those impossibilities would have been accomplished before the century was out; transmuting lead into gold, for example. Because we remember future civilizations surprising past civilizations, it has become cliche that we can’t put limits on our great-grandchildren.

And yet everyone in the 20th century, in the 19th century, and in the 11th century, was human. There is also the sort of magic that a human gun is to a wolf, or the sort of magic that human genetic engineering is to natural selection.

To “improve your own capabilities” is an instrumental goal, and if a smarter intelligence than my own is focused on that goal, I should expect to be surprised. The mind may find ways to produce larger jumps in capability than I can visualize myself. Where higher creativity than mine is at work and looking for shorter shortcuts, the discontinuities that I imagine may be dwarfed by the discontinuities that it can imagine.

And remember how little progress it takes—just a hundred years of human time, with everyone still human—to turn things that would once have been “unimaginable” into heated debates about feasibility. So if you build a mind smarter than you, and it thinks about how to go FOOM quickly, and it goes FOOM faster than you imagined possible, you really have no right to complain—based on the history of mere human history, you should have expected a significant probability of being surprised. Not, surprised that the nanotech is 50% faster than you thought it would be. Surprised the way the Victorians would have been surprised by nanotech.

Thus the last item on my (current, somewhat ad-hoc) list of reasons to expect discontinuity: Cascades, cycles, insight, recursion, magic.

What links here?

Eliezer Yudkowsky25 Nov 2008 9:10 UTC

27 points

28 comments5 min readLW link Archive

AI AI Takeoff General Intelligence Recursive Self-Improvement

Will_Pearson 25 Nov 2008 10:01 UTC
1 point
Do you have any evidence that insight is applicable to understanding and creating intelligences? Because without that recursion isn’t powerful and magic doesn’t start to get off the ground.
RobinHanson 25 Nov 2008 10:27 UTC
7 points
You really think an office worker with modern computer tools is only 10% more productive than one with 1950-era non-computer tools? Even at the task of creating better computer tools?

Many important innovations can be thought of as changing the range of things that can be changed, relative to an inheritance that up to that point was not usefully open to focused or conscious development. And each new item added to the list of things we can usefully change increases the possibilities for growing everything else. (While this potentially allows for an increase in the growth rate, rate changes have actually been very rare.) Why aren’t all these changes “recursive”? Why reserve that name only for changes to our mental architecture?
david5 25 Nov 2008 11:03 UTC
3 points
Presumably, if office workers were all obliged to concentrate on creating better computer tools—as is the case in an analogous nuclear pile, the neutrons don’t really have a choice—then a 10% improvement in productivity would be sufficient.

But an economy is different, surely! An office worker may be potentially better at creating better computer tools. But the equilibrium rate of creating better computer tools is perhaps not sufficiently high?
RobinHanson 25 Nov 2008 13:46 UTC
9 points
You speculate about why Eurisko slowed to a halt and then complain that Lenat has wasted his life with CYC, but you ignore that Lenat has his own theory which he gives as the reason he’s been pursuing CYC. You should at least explain why you think his theory wrong; I find his theory quite plausible.
Savage 25 Nov 2008 14:35 UTC
3 points
“You should at least explain why you think his theory wrong”

Please don’t ask that… I’ve heard the parables too many times to count, about how you can’t build an AI by putting in individual pieces of knowledge, you need something that generates these pieces of knowledge itself, and so on.

CYC has become a cliche absurdity...
Carl_Shulman 25 Nov 2008 14:40 UTC
0 points
If CYC can be largely brute-forced with enough info and researcher hours, then shouldn’t we expect a million sped-up instances of Doug Lenat to be able to reach hard-coded AI to input enough info rather quickly?
Dan7 25 Nov 2008 16:30 UTC
2 points
Can someone give a brief definition of FOOM or the link that explains it?
frelkins 25 Nov 2008 16:55 UTC
5 points
@Dan

FOOM (actually, it should be FWOOM!) is onomatopoeia for the sound of flash ignition—imagine filling your oven with gas and then tossing in a match - FWOOM! Also in video games it’s often the sound your head makes when it explodes.

I believe it’s Robin’s humor to designate the “feeling” of the hard takeoff, and humanity’s social reaction to such major sudden change.
michael_vassar3 25 Nov 2008 18:09 UTC
6 points
I have been an office worker using easily documented approximate solutions generated via algorythms to do what my grandparents generation would have done with provably correct logical solutions to the same problems. They would have taken less time in some respects and more in others. On net, I’d guess that we weren’t even 10% more productive. We generated many so-called “solutions” in the time they would have taken to generate one solution, but their solution would have been the correct solution while our procedure would be to then choose one of our many solutions for political reasons. We didn’t take less time per project. We had more secretarial staff. We accomplished the same sort of work, and we did a worse job. Maybe ¹⁄₃ as productive over all?

Obviously, the tools we used could have been used to increase productivity, but could have != did. This phenomenon, as well as very rough measures, may explain the supposed uniformity of growth rates. An AGI with shared goal content and closed loops of self-improvement would not have the same difficulties as an economy in this respect and might plausibly be expected to show significant growth rate increases from insights on the level of the mouse.
John_Maxwell2 25 Nov 2008 19:14 UTC
0 points
Perhaps, in analogy with Fermi’s pile, there is a certain critical mass of intelligence that is necessary for an AI to go FOOM. Can we figure out how much intelligence is needed? Is it reasonable to assume that it is more than the effective intelligence of all of the AI researchers working in AI? Or more conservatively, the intelligence of one AI researcher?
Phil_Goetz5 25 Nov 2008 19:17 UTC
10 points
“You speculate about why Eurisko slowed to a halt and then complain that Lenat has wasted his life with CYC, but you ignore that Lenat has his own theory which he gives as the reason he’s been pursuing CYC. You should at least explain why you think his theory wrong; I find his theory quite plausible.”
- Around 1990, Lenat predicted that Cyc would go FOOM by 2000. In 1999, he told me he expected it to go FOOM within a couple of years. Where’s the FOOM?
- Cyc has no cognitive architecture. It’s a database. You can ask it questions. It has templates for answering specific types of questions. It has (last I checked, about 10 years ago) no notion of goals, actions, plans, learning, or its own agenthood.
Doug_S. 25 Nov 2008 20:19 UTC
8 points
I think those prophets of circa 1900 would still get some bounds correct...

If someone points out to you that your pet theory of the universe is in disagreement with Maxwell’s equations — then so much the worse for Maxwell’s equations. If it is found to be contradicted by observation — well, these experimentalists do bungle things sometimes. But if your theory is found to be against the second law of thermodynamics I can give you no hope; there is nothing for it but to collapse in deepest humiliation.

- Sir Arthur Stanley Eddington, The Nature of the Physical World (1915), chapter 4
Eliezer Yudkowsky 25 Nov 2008 21:03 UTC
11 points
You speculate about why Eurisko slowed to a halt and then complain that Lenat has wasted his life with CYC, but you ignore that Lenat has his own theory which he gives as the reason he’s been pursuing CYC. You should at least explain why you think his theory wrong; I find his theory quite plausible.

Artificial Addition, The Nature of Logic, Truly Part of You, Words as Mental Paintbrush Handles, Detached Lever Fallacy...

You really think an office worker with modern computer tools is only 10% more productive than one with 1950-era non-computer tools? Even at the task of creating better computer tools?

I’d started to read Engelbart’s vast proposal-paper, and he was talking about computers as a tool of intelligence enhancement. It’s this that I had in mind when, trying to be generous, I said “10%”. Obviously there are various object-level problems at which someone with a computer is a lot more productive, like doing complicated integrals with no analytic solution.

But what concerns us is the degree of reinvestable improvement, the sort of improvement that will go into better tools that can be used to make still better tools. Office work isn’t a candidate for this.

And yes, we use programming languages to write better programming languages—but there are some people out there who still swear by Emacs; would the state of computer science be so terribly far behind where it is now, after who knows how many cycles of reinvestment, if the mouse had still not been invented?

I don’t know, but to the extent such an effect existed, I would expect it to be more due to less popular uptake leading to less investment—and not a whole lot due to losing out on the compound interest from a mouse making you, allegedly, 10% smarter, including 10% smarter at the kind of computer science that helps you do further computer science.
Philip_Hunt 26 Nov 2008 2:10 UTC
0 points
Robin Hanson: You really think an office worker with modern computer tools is only 10% more productive than one with 1950-era non-computer tools? Even at the task of creating better computer tools?

Not in the general case. However, I’ve known people who use these modern computer tools to take ages typesetting a simple document, playing with fonts, colours and styles, and have managed to get very little work done. I’ve even known people who’ve managed to get negative work done due to their incompetence in using computers.
Philip_Hunt 26 Nov 2008 2:13 UTC
3 points
Eliezer Yudkowsky: EURISKO may still be the most sophisticated self-improving AI ever built

Why is that, do you think? Eurisko was written a quarter-century ago, and is not exactly obscure. Have people not tried to build a better Eurisko, or have they tried and failed?
- VAuroch 23 Dec 2013 7:44 UTC
  4 points
  Parent
  I think Eurisko actually is obscure, though it may not have been at the time. I’ve never heard any other reference to it.
- private_messaging 23 Dec 2013 10:27 UTC
  1 point
  Parent
  In my experience with people in humanities, output strings like “unimportant old publication may still be the most comprehensive writing on the topic of XYZ” are seldom a product of some sort of familiarity with said publication or the topic of XYZ.
babar 1 Dec 2008 20:57 UTC
−4 points
you just need to choose the right invariant which is orthogonal to intelligence and then optimise intelligence
Tim_Tyler 1 Dec 2008 23:20 UTC
7 points
yes, we use programming languages to write better programming languages—but there are some people out there who still swear by Emacs [...]

It’s not just computer languages. Computers help build better computers, editors help build better editors—and so on.

Plus there are synergetic loops: computers help build better editors, and editors help build better computers.

Look as the whole system and you see that the whole man-machine symbiosis builds the next generation of man-machine symbiosis.

The result is the cumulative snowballing of technological progress: technology making better technology. That has actually been going on for billions of years, if you include “natural technologies”—such as photosyntesis, cellulose, sexual reproduction—and so on.

The result is what I refer to as the ongoing “technology explosion”. From such a perspective, superintelligent machines are the latest move in an ancient dance.
Jay_Ballou 14 Feb 2009 0:34 UTC
0 points
All “insights” eventually bottom out in the same way that Eurisko bottomed out; the notion of ever-increasing gain by applying some rule or metarule is a fantasy. You make the same sort of mistake about “insight” as do people like Roger Penrose, who believes that humans can “see” things that no computer could, except that you think that a computer can too, whereas in reality neither humans nor computers have access to any such magical “insight” sauce.

The future has a reputation for accomplishing feats which the past thought impossible.

Yes, but our ability to predict which seemingly impossible feats will actually be accomplished is quite poor, so this fact is neither here nor there, but it is appealed to by crackpots everywhere as an ad hoc defense of their own claims.
Jay_Ballou 14 Feb 2009 0:42 UTC
0 points
editors help build better editors

But not by much, and not much better editors. It takes something else—better concepts—to build significantly better editors, and one only needs a moderately good editor in the process. And regardless of things like Eurisko, no one has the faintest idea how to automate having significantly better concepts.
mattnewport 30 Apr 2010 6:00 UTC
3 points

So if you run an optimizing compiler on its own source code, and then use the product to do the same again, it should produce the same output on both occasions—at most, the first-order product will run faster than the original compiler.

Now if you are one of those annoying nitpicky types, like me, you will notice a flaw in this logic:

As one of those annoying nitpicky types I think it is perhaps interesting to note that the current highest level of optimization available with the Microsoft C++ compilers is PGO or Profile Guided Optimization. The basic idea is to collect data from actual runs of the program and feed this back into the optimizer to guide its optimization decisions. Adaptive Optimization is the same basic idea applied to optimize code while it is running based on live performance data.

As you might predict, the speedups achievable by such techniques are worthwhile but modest and not FOOM-worthy.
kilobug 21 Sep 2011 20:25 UTC
2 points
FYI : the link to the “Traveller wargame” (http://www.aliciapatterson.org/APF0704/Johnson/Johnson.html) is broken.
- RobinZ 21 Sep 2011 21:02 UTC
  3 points
  Parent
  The Internet Archive has a copy.
  - Vladimir_Nesov 21 Sep 2011 21:32 UTC
    2 points
    Parent
    Fixed.
    - RobinZ 22 Oct 2011 19:57 UTC
      2 points
      Parent
      Better fix: the updated URL..
gwern 13 Sep 2012 17:47 UTC
10 points

Unless, y’know, you’re counting becoming world champion at Traveller without ever previously playing a human, as some sort of accomplishment.

It certainly is interesting, especially given the weirdly extreme or suicidal tactics it used. I recently into this quote on Slate and was struck by the similarity:

40K may not be a true simulation of armed conflict, but it’s part of a centuries-long tradition of war games. After World War II, U.S. Navy Adm. Chester W. Nimitz credited gaming for helping the Allies prepare. “The war with Japan had been re-enacted in the game rooms here by so many people and in so many different ways,” he said, “that nothing that happened during the war was a surprise—absolutely nothing except the kamikaze tactics towards the end of the war; we had not visualized those.”
sbenthall 28 Dec 2012 2:46 UTC
4 points
I find this difficult to follow. Is there a concrete mathematical definition of ‘recursion’ in this sense available anywhere?