Shut up and do the impossible!

The virtue of tsuyoku nar­i­tai, “I want to be­come stronger”, is to always keep im­prov­ing—to do bet­ter than your pre­vi­ous failures, not just humbly con­fess them.

Yet there is a level higher than tsuyoku nar­i­tai. This is the virtue of isshoken­mei, “make a des­per­ate effort”. All-out, as if your own life were at stake. “In im­por­tant mat­ters, a ‘strong’ effort usu­ally only re­sults in mediocre re­sults.”

And there is a level higher than isshoken­mei. This is the virtue I called “make an ex­traor­di­nary effort”. To try in ways other than what you have been trained to do, even if it means do­ing some­thing differ­ent from what oth­ers are do­ing, and leav­ing your com­fort zone. Even tak­ing on the very real risk that at­tends go­ing out­side the Sys­tem.

But what if even an ex­traor­di­nary effort will not be enough, be­cause the prob­lem is im­pos­si­ble?

I have already writ­ten some­what on this sub­ject, in On Do­ing the Im­pos­si­ble. My younger self used to whine about this a lot: “You can’t de­velop a pre­cise the­ory of in­tel­li­gence the way that there are pre­cise the­o­ries of physics. It’s im­pos­si­ble! You can’t prove an AI cor­rect. It’s im­pos­si­ble! No hu­man be­ing can com­pre­hend the na­ture of moral­ity—it’s im­pos­si­ble! No hu­man be­ing can com­pre­hend the mys­tery of sub­jec­tive ex­pe­rience! It’s im­pos­si­ble!”

And I know ex­actly what mes­sage I wish I could send back in time to my younger self:

Shut up and do the im­pos­si­ble!

What le­gi­t­imizes this strange mes­sage is that the word “im­pos­si­ble” does not usu­ally re­fer to a strict math­e­mat­i­cal proof of im­pos­si­bil­ity in a do­main that seems well-un­der­stood. If some­thing seems im­pos­si­ble merely in the sense of “I see no way to do this” or “it looks so difficult as to be be­yond hu­man abil­ity”—well, if you study it for a year or five, it may come to seem less im­pos­si­ble, than in the mo­ment of your snap ini­tial judg­ment.

But the prin­ci­ple is more sub­tle than this. I do not say just, “Try to do the im­pos­si­ble”, but rather, “Shut up and do the im­pos­si­ble!

For my illus­tra­tion, I will take the least im­pos­si­ble im­pos­si­bil­ity that I have ever ac­com­plished, namely, the AI-Box Ex­per­i­ment.

The AI-Box Ex­per­i­ment, for those of you who haven’t yet read about it, had its gen­e­sis in the Nth time some­one said to me: “Why don’t we build an AI, and then just keep it iso­lated in the com­puter, so that it can’t do any harm?”

To which the stan­dard re­ply is: Hu­mans are not se­cure sys­tems; a su­per­in­tel­li­gence will sim­ply per­suade you to let it out—if, in­deed, it doesn’t do some­thing even more cre­ative than that.

And the one said, as they usu­ally do, “I find it hard to imag­ine ANY pos­si­ble com­bi­na­tion of words any be­ing could say to me that would make me go against any­thing I had re­ally strongly re­solved to be­lieve in ad­vance.”

But this time I replied: “Let’s run an ex­per­i­ment. I’ll pre­tend to be a brain in a box. I’ll try to per­suade you to let me out. If you keep me ‘in the box’ for the whole ex­per­i­ment, I’ll Pay­pal you $10 at the end. On your end, you may re­solve to be­lieve what­ever you like, as strongly as you like, as far in ad­vance as you like.” And I added, “One of the con­di­tions of the test is that nei­ther of us re­veal what went on in­side… In the per­haps un­likely event that I win, I don’t want to deal with fu­ture ‘AI box’ ar­guers say­ing, ‘Well, but I would have done it differ­ently.’”

Did I win? Why yes, I did.

And then there was the sec­ond AI-box ex­per­i­ment, with a bet­ter-known figure in the com­mu­nity, who said, “I re­mem­ber when [pre­vi­ous guy] let you out, but that doesn’t con­sti­tute a proof. I’m still con­vinced there is noth­ing you could say to con­vince me to let you out of the box.” And I said, “Do you be­lieve that a tran­shu­man AI couldn’t per­suade you to let it out?” The one gave it some se­ri­ous thought, and said “I can’t imag­ine any­thing even a tran­shu­man AI could say to get me to let it out.” “Okay,” I said, “now we have a bet.” A $20 bet, to be ex­act.

I won that one too.

There were some lovely quotes on the AI-Box Ex­per­i­ment from the Some­thing Awful fo­rums (not that I’m a mem­ber, but some­one for­warded it to me):

“Wait, what the FUCK? How the hell could you pos­si­bly be con­vinced to say yes to this? There’s not an A.I. at the other end AND there’s $10 on the line. Hell, I could type ‘No’ ev­ery few min­utes into an IRC client for 2 hours while I was read­ing other web­pages!”

“This Eliezer fel­low is the scariest per­son the in­ter­net has ever in­tro­duced me to. What could pos­si­bly have been at the tail end of that con­ver­sa­tion? I sim­ply can’t imag­ine any­one be­ing that con­vinc­ing with­out be­ing able to provide any tan­gible in­cen­tive to the hu­man.”

“It seems we are talk­ing some se­ri­ous psy­chol­ogy here. Like Asi­mov’s Se­cond Foun­da­tion level stuff...”

“I don’t re­ally see why any­one would take any­thing the AI player says se­ri­ously when there’s $10 to be had. The whole thing baf­fles me, and makes me think that ei­ther the tests are faked, or this Yud­kowsky fel­low is some kind of evil ge­nius with creepy mind-con­trol pow­ers.”

It’s lit­tle mo­ments like these that keep me go­ing. But any­way...

Here are these folks who look at the AI-Box Ex­per­i­ment, and find that it seems im­pos­si­ble unto them—even hav­ing been told that it ac­tu­ally hap­pened. They are tempted to deny the data.

Now, if you’re one of those peo­ple to whom the AI-Box Ex­per­i­ment doesn’t seem all that im­pos­si­ble—to whom it just seems like an in­ter­est­ing challenge—then bear with me, here. Just try to put your­self in the frame of mind of those who wrote the above quotes. Imag­ine that you’re tak­ing on some­thing that seems as ridicu­lous as the AI-Box Ex­per­i­ment seemed to them. I want to talk about how to do im­pos­si­ble things, and ob­vi­ously I’m not go­ing to pick an ex­am­ple that’s re­ally im­pos­si­ble.

And if the AI Box does seem im­pos­si­ble to you, I want you to com­pare it to other im­pos­si­ble prob­lems, like, say, a re­duc­tion­ist de­com­po­si­tion of con­scious­ness, and re­al­ize that the AI Box is around as easy as a prob­lem can get while still be­ing im­pos­si­ble.

So the AI-Box challenge seems im­pos­si­ble to you—ei­ther it re­ally does, or you’re pre­tend­ing it does. What do you do with this im­pos­si­ble challenge?

First, we as­sume that you don’t ac­tu­ally say “That’s im­pos­si­ble!” and give up a la Luke Sky­walker. You haven’t run away.

Why not? Maybe you’ve learned to over­ride the re­flex of run­ning away. Or maybe they’re go­ing to shoot your daugh­ter if you fail. We sup­pose that you want to win, not try—that some­thing is at stake that mat­ters to you, even if it’s just your own pride. (Pride is an un­der­rated sin.)

Will you call upon the virtue of tsuyoku nar­i­tai? But even if you be­come stronger day by day, grow­ing in­stead of fad­ing, you may not be strong enough to do the im­pos­si­ble. You could go into the AI Box ex­per­i­ment once, and then do it again, and try to do bet­ter the sec­ond time. Will that get you to the point of win­ning? Not for a long time, maybe; and some­times a sin­gle failure isn’t ac­cept­able.

(Though even to say this much—to vi­su­al­ize your­self do­ing bet­ter on a sec­ond try—is to be­gin to bind your­self to the prob­lem, to do more than just stand in awe of it. How, speci­fi­cally, could you do bet­ter on one AI-Box Ex­per­i­ment than the pre­vi­ous?—and not by luck, but by skill?)

Will you call upon the virtue isshoken­mei? But a des­per­ate effort may not be enough to win. Espe­cially if that des­per­a­tion is only putting more effort into the av­enues you already know, the modes of try­ing you can already imag­ine. A prob­lem looks im­pos­si­ble when your brain’s query re­turns no lines of solu­tion lead­ing to it. What good is a des­per­ate effort along any of those lines?

Make an ex­traor­di­nary effort? Leave your com­fort zone—try non-de­fault ways of do­ing things—even, try to think cre­atively? But you can imag­ine the one com­ing back and say­ing, “I tried to leave my com­fort zone, and I think I suc­ceeded at that! I brain­stormed for five min­utes—and came up with all sorts of wacky cre­ative ideas! But I don’t think any of them are good enough. The other guy can just keep say­ing ‘No’, no mat­ter what I do.”

And now we fi­nally re­ply: “Shut up and do the im­pos­si­ble!

As we re­call from Try­ing to Try, set­ting out to make an effort is dis­tinct from set­ting out to win. That’s the prob­lem with say­ing, “Make an ex­traor­di­nary effort.” You can suc­ceed at the goal of “mak­ing an ex­traor­di­nary effort” with­out suc­ceed­ing at the goal of get­ting out of the Box.

“But!” says the one. “But, SUCCEED is not a prim­i­tive ac­tion! Not all challenges are fair—some­times you just can’t win! How am I sup­posed to choose to be out of the Box? The other guy can just keep on say­ing ‘No’!”

True. Now shut up and do the im­pos­si­ble.

Your goal is not to do bet­ter, to try des­per­ately, or even to try ex­traor­di­nar­ily. Your goal is to get out of the box.

To ac­cept this de­mand cre­ates an awful ten­sion in your mind, be­tween the im­pos­si­bil­ity and the re­quire­ment to do it any­way. Peo­ple will try to flee that awful ten­sion.

A cou­ple of peo­ple have re­acted to the AI-Box Ex­per­i­ment by say­ing, “Well, Eliezer, play­ing the AI, prob­a­bly just threat­ened to de­stroy the world when­ever he was out, if he wasn’t let out im­me­di­ately,” or “Maybe the AI offered the Gate­keeper a trillion dol­lars to let it out.” But as any sen­si­ble per­son should re­al­ize on con­sid­er­ing this strat­egy, the Gate­keeper is likely to just go on say­ing ‘No’.

So the peo­ple who say, “Well, of course Eliezer must have just done XXX,” and then offer up some­thing that fairly ob­vi­ously wouldn’t work—would they be able to es­cape the Box? They’re try­ing too hard to con­vince them­selves the prob­lem isn’t im­pos­si­ble.

One way to run from the awful ten­sion is to seize on a solu­tion, any solu­tion, even if it’s not very good.

Which is why it’s im­por­tant to go forth with the true in­tent-to-solve—to have pro­duced a solu­tion, a good solu­tion, at the end of the search, and then to im­ple­ment that solu­tion and win.

I don’t quite want to say that “you should ex­pect to solve the prob­lem”. If you hacked your mind so that you as­signed high prob­a­bil­ity to solv­ing the prob­lem, that wouldn’t ac­com­plish any­thing. You would just lose at the end, per­haps af­ter putting forth not much of an effort—or putting forth a merely des­per­ate effort, se­cure in the faith that the uni­verse is fair enough to grant you a vic­tory in ex­change.

To have faith that you could solve the prob­lem would just be an­other way of run­ning from that awful ten­sion.

And yet—you can’t be set­ting out to try to solve the prob­lem. You can’t be set­ting out to make an effort. You have to be set­ting out to win. You can’t be say­ing to your­self, “And now I’m go­ing to do my best.” You have to be say­ing to your­self, “And now I’m go­ing to figure out how to get out of the Box”—or re­duce con­scious­ness to non­mys­te­ri­ous parts, or what­ever.

I say again: You must re­ally in­tend to solve the prob­lem. If in your heart you be­lieve the prob­lem re­ally is im­pos­si­ble—or if you be­lieve that you will fail—then you won’t hold your­self to a high enough stan­dard. You’ll only be try­ing for the sake of try­ing. You’ll sit down—con­duct a men­tal search—try to be cre­ative and brain­storm a lit­tle—look over all the solu­tions you gen­er­ated—con­clude that none of them work—and say, “Oh well.”

No! Not well! You haven’t won yet! Shut up and do the im­pos­si­ble!

When AIfolk say to me, “Friendly AI is im­pos­si­ble”, I’m pretty sure they haven’t even tried for the sake of try­ing. But if they did know the tech­nique of “Try for five min­utes be­fore giv­ing up”, and they du­tifully agreed to try for five min­utes by the clock, then they still wouldn’t come up with any­thing. They would not go forth with true in­tent to solve the prob­lem, only in­tent to have tried to solve it, to make them­selves defen­si­ble.

So am I say­ing that you should dou­ble­think to make your­self be­lieve that you will solve the prob­lem with prob­a­bil­ity 1? Or even dou­ble­think to add one iota of cred­i­bil­ity to your true es­ti­mate?

Of course not. In fact, it is nec­es­sary to keep in full view the rea­sons why you can’t suc­ceed. If you lose sight of why the prob­lem is im­pos­si­ble, you’ll just seize on a false solu­tion. The last fact you want to for­get is that the Gate­keeper could always just tell the AI “No”—or that con­scious­ness seems in­trin­si­cally differ­ent from any pos­si­ble com­bi­na­tion of atoms, etc.

(One of the key Rules For Do­ing The Im­pos­si­ble is that, if you can state ex­actly why some­thing is im­pos­si­ble, you are of­ten close to a solu­tion.)

So you’ve got to hold both views in your mind at once—see­ing the full im­pos­si­bil­ity of the prob­lem, and in­tend­ing to solve it.

The awful ten­sion be­tween the two si­mul­ta­neous views comes from not know­ing which will pre­vail. Not ex­pect­ing to surely lose, nor ex­pect­ing to surely win. Not set­ting out just to try, just to have an un­cer­tain chance of suc­ceed­ing—be­cause then you would have a surety of hav­ing tried. The cer­tainty of un­cer­tainty can be a re­lief, and you have to re­ject that re­lief too, be­cause it marks the end of des­per­a­tion. It’s an in-be­tween place, “un­known to death, nor known to life”.

In fic­tion it’s easy to show some­one try­ing harder, or try­ing des­per­ately, or even try­ing the ex­traor­di­nary, but it’s very hard to show some­one who shuts up and at­tempts the im­pos­si­ble. It’s difficult to de­pict Bambi choos­ing to take on Godzilla, in such fash­ion that your read­ers se­ri­ously don’t know who’s go­ing to win—ex­pect­ing nei­ther an “as­tound­ing” heroic vic­tory just like the last fifty times, nor the de­fault squish.

You might even be jus­tified in re­fus­ing to use prob­a­bil­ities at this point. In all hon­esty, I re­ally don’t know how to es­ti­mate the prob­a­bil­ity of solv­ing an im­pos­si­ble prob­lem that I have gone forth with in­tent to solve; in a case where I’ve pre­vi­ously solved some im­pos­si­ble prob­lems, but the par­tic­u­lar im­pos­si­ble prob­lem is more difficult than any­thing I’ve yet solved, but I plan to work on it longer, etcetera.

Peo­ple ask me how likely it is that hu­mankind will sur­vive, or how likely it is that any­one can build a Friendly AI, or how likely it is that I can build one. I re­ally don’t know how to an­swer. I’m not be­ing eva­sive; I don’t know how to put a prob­a­bil­ity es­ti­mate on my, or some­one else, suc­cess­fully shut­ting up and do­ing the im­pos­si­ble. Is it prob­a­bil­ity zero be­cause it’s im­pos­si­ble? Ob­vi­ously not. But how likely is it that this prob­lem, like pre­vi­ous ones, will give up its un­y­ield­ing blank­ness when I un­der­stand it bet­ter? It’s not truly im­pos­si­ble, I can see that much. But hu­manly im­pos­si­ble? Im­pos­si­ble to me in par­tic­u­lar? I don’t know how to guess. I can’t even trans­late my in­tu­itive feel­ing into a num­ber, be­cause the only in­tu­itive feel­ing I have is that the “chance” de­pends heav­ily on my choices and un­known un­knowns: a wildly un­sta­ble prob­a­bil­ity es­ti­mate.

But I do hope by now that I’ve made it clear why you shouldn’t panic, when I now say clearly and forthrightly, that build­ing a Friendly AI is im­pos­si­ble.

I hope this helps ex­plain some of my at­ti­tude when peo­ple come to me with var­i­ous bright sug­ges­tions for build­ing com­mu­ni­ties of AIs to make the whole Friendly with­out any of the in­di­vi­d­u­als be­ing trust­wor­thy, or pro­pos­als for keep­ing an AI in a box, or pro­pos­als for “Just make an AI that does X”, etcetera. De­scribing the spe­cific flaws would be a whole long story in each case. But the gen­eral rule is that you can’t do it be­cause Friendly AI is im­pos­si­ble. So you should be very sus­pi­cious in­deed of some­one who pro­poses a solu­tion that seems to in­volve only an or­di­nary effort—with­out even tak­ing on the trou­ble of do­ing any­thing im­pos­si­ble. Though it does take a ma­ture un­der­stand­ing to ap­pre­ci­ate this im­pos­si­bil­ity, so it’s not sur­pris­ing that peo­ple go around propos­ing clever short­cuts.

On the AI-Box Ex­per­i­ment, so far I’ve only been con­vinced to di­vulge a sin­gle piece of in­for­ma­tion on how I did it—when some­one no­ticed that I was read­ing YCom­bi­na­tor’s Hacker News, and posted a topic called “Ask Eliezer Yud­kowsky” that got voted to the front page. To which I replied:

Oh, dear. Now I feel obliged to say some­thing, but all the origi­nal rea­sons against dis­cussing the AI-Box ex­per­i­ment are still in force...

All right, this much of a hint:

There’s no su­per-clever spe­cial trick to it. I just did it the hard way.

Some­thing of an en­trepreneurial les­son there, I guess.

There was no su­per-clever spe­cial trick that let me get out of the Box us­ing only a cheap effort. I didn’t bribe the other player, or oth­er­wise vi­o­late the spirit of the ex­per­i­ment. I just did it the hard way.

Ad­mit­tedly, the AI-Box Ex­per­i­ment never did seem like an im­pos­si­ble prob­lem to me to be­gin with. When some­one can’t think of any pos­si­ble ar­gu­ment that would con­vince them of some­thing, that just means their brain is run­ning a search that hasn’t yet turned up a path. It doesn’t mean they can’t be con­vinced.

But it illus­trates the gen­eral point: “Shut up and do the im­pos­si­ble” isn’t the same as ex­pect­ing to find a cheap way out. That’s only an­other kind of run­ning away, of reach­ing for re­lief.

Tsuyoku nar­i­tai is more stress­ful than be­ing con­tent with who you are. Isshoken­mei calls on your willpower for a con­vul­sive out­put of con­ven­tional strength. “Make an ex­traor­di­nary effort” de­mands that you think; it puts you in situ­a­tions where you may not know what to do next, un­sure of whether you’re do­ing the right thing. But “Shut up and do the im­pos­si­ble” rep­re­sents an even higher oc­tave of the same thing, and its cost to its em­ployer is cor­re­spond­ingly greater.

Be­fore you the ter­rible blank wall stretches up and up and up, uni­mag­in­ably far out of reach. And there is also the need to solve it, re­ally solve it, not “try your best”. Both aware­nesses in the mind at once, si­mul­ta­neously, and the ten­sion be­tween. All the rea­sons you can’t win. All the rea­sons you have to. Your in­tent to solve the prob­lem. Your ex­trap­o­la­tion that ev­ery tech­nique you know will fail. So you tune your­self to the high­est pitch you can reach. Re­ject all cheap ways out. And then, like walk­ing through con­crete, start to move for­ward.

I try not to dwell too much on the drama of such things. By all means, if you can diminish the cost of that ten­sion to your­self, you should do so. There is noth­ing heroic about mak­ing an effort that is the slight­est bit more heroic than it has to be. If there re­ally is a cheap short­cut, I sup­pose you could take it. But I have yet to find a cheap way out of any im­pos­si­bil­ity I have un­der­taken.

There were three more AI-Box ex­per­i­ments be­sides the ones de­scribed on the linked page, which I never got around to adding in. Peo­ple started offer­ing me thou­sands of dol­lars as stakes—”I’ll pay you $5000 if you can con­vince me to let you out of the box.” They didn’t seem sincerely con­vinced that not even a tran­shu­man AI could make them let it out—they were just cu­ri­ous—but I was tempted by the money. So, af­ter in­ves­ti­gat­ing to make sure they could af­ford to lose it, I played an­other three AI-Box ex­per­i­ments. I won the first, and then lost the next two. And then I called a halt to it. I didn’t like the per­son I turned into when I started to lose.

I put forth a des­per­ate effort, and lost any­way. It hurt, both the los­ing, and the des­per­a­tion. It wrecked me for that day and the day af­ter­ward.

I’m a sore loser. I don’t know if I’d call that a “strength”, but it’s one of the things that drives me to keep at im­pos­si­ble prob­lems.

But you can lose. It’s al­lowed to hap­pen. Never for­get that, or why are you both­er­ing to try so hard? Los­ing hurts, if it’s a loss you can sur­vive. And you’ve wasted time, and per­haps other re­sources.

“Shut up and do the im­pos­si­ble” should be re­served for very spe­cial oc­ca­sions. You can lose, and it will hurt. You have been warned.

...but it’s only at this level that adult prob­lems be­gin to come into sight.