Shut up and do the im­possible!

The vir­tue of tsuy­oku naritai, “I want to be­come stronger”, is to al­ways keep im­prov­ing—to do bet­ter than your pre­vi­ous fail­ures, not just humbly con­fess them.

Yet there is a level higher than tsuy­oku naritai. This is the vir­tue of is­shoken­mei, “make a des­per­ate ef­fort”. All-out, as if your own life were at stake. “In im­port­ant mat­ters, a ‘strong’ ef­fort usu­ally only res­ults in me­diocre res­ults.”

And there is a level higher than is­shoken­mei. This is the vir­tue I called “make an ex­traordin­ary ef­fort”. To try in ways other than what you have been trained to do, even if it means do­ing some­thing dif­fer­ent from what oth­ers are do­ing, and leav­ing your com­fort zone. Even tak­ing on the very real risk that at­tends go­ing out­side the Sys­tem.

But what if even an ex­traordin­ary ef­fort will not be enough, be­cause the prob­lem is im­possible?

I have already writ­ten some­what on this sub­ject, in On Do­ing the Im­possible. My younger self used to whine about this a lot: “You can’t de­velop a pre­cise the­ory of in­tel­li­gence the way that there are pre­cise the­or­ies of phys­ics. It’s im­possible! You can’t prove an AI cor­rect. It’s im­possible! No hu­man be­ing can com­pre­hend the nature of mor­al­ity—it’s im­possible! No hu­man be­ing can com­pre­hend the mys­tery of sub­ject­ive ex­per­i­ence! It’s im­possible!”

And I know ex­actly what mes­sage I wish I could send back in time to my younger self:

Shut up and do the im­possible!

What le­git­im­izes this strange mes­sage is that the word “im­possible” does not usu­ally refer to a strict math­em­at­ical proof of im­possib­il­ity in a do­main that seems well-un­der­stood. If some­thing seems im­possible merely in the sense of “I see no way to do this” or “it looks so dif­fi­cult as to be bey­ond hu­man abil­ity”—well, if you study it for a year or five, it may come to seem less im­possible, than in the mo­ment of your snap ini­tial judg­ment.

But the prin­ciple is more subtle than this. I do not say just, “Try to do the im­possible”, but rather, “Shut up and do the im­possible!

For my il­lus­tra­tion, I will take the least im­possible im­possib­il­ity that I have ever ac­com­plished, namely, the AI-Box Ex­per­i­ment.

The AI-Box Ex­per­i­ment, for those of you who haven’t yet read about it, had its gen­esis in the Nth time someone said to me: “Why don’t we build an AI, and then just keep it isol­ated in the com­puter, so that it can’t do any harm?”

To which the stand­ard reply is: Hu­mans are not se­cure sys­tems; a su­per­in­tel­li­gence will simply per­suade you to let it out—if, in­deed, it doesn’t do some­thing even more cre­at­ive than that.

And the one said, as they usu­ally do, “I find it hard to ima­gine ANY pos­sible com­bin­a­tion of words any be­ing could say to me that would make me go against any­thing I had really strongly re­solved to be­lieve in ad­vance.”

But this time I replied: “Let’s run an ex­per­i­ment. I’ll pre­tend to be a brain in a box. I’ll try to per­suade you to let me out. If you keep me ‘in the box’ for the whole ex­per­i­ment, I’ll Paypal you $10 at the end. On your end, you may re­solve to be­lieve whatever you like, as strongly as you like, as far in ad­vance as you like.” And I ad­ded, “One of the con­di­tions of the test is that neither of us re­veal what went on in­side… In the per­haps un­likely event that I win, I don’t want to deal with fu­ture ‘AI box’ ar­guers say­ing, ‘Well, but I would have done it dif­fer­ently.’”

Did I win? Why yes, I did.

And then there was the second AI-box ex­per­i­ment, with a bet­ter-known fig­ure in the com­munity, who said, “I re­mem­ber when [pre­vi­ous guy] let you out, but that doesn’t con­sti­tute a proof. I’m still con­vinced there is noth­ing you could say to con­vince me to let you out of the box.” And I said, “Do you be­lieve that a transhuman AI couldn’t per­suade you to let it out?” The one gave it some ser­i­ous thought, and said “I can’t ima­gine any­thing even a transhuman AI could say to get me to let it out.” “Okay,” I said, “now we have a bet.” A $20 bet, to be ex­act.

I won that one too.

There were some lovely quotes on the AI-Box Ex­per­i­ment from the So­mething Aw­ful for­ums (not that I’m a mem­ber, but someone for­war­ded it to me):

“Wait, what the FUCK? How the hell could you pos­sibly be con­vinced to say yes to this? There’s not an A.I. at the other end AND there’s $10 on the line. Hell, I could type ‘No’ every few minutes into an IRC cli­ent for 2 hours while I was read­ing other webpages!”

“This Eliezer fel­low is the scar­i­est per­son the in­ter­net has ever in­tro­duced me to. What could pos­sibly have been at the tail end of that con­ver­sa­tion? I simply can’t ima­gine any­one be­ing that con­vin­cing without be­ing able to provide any tan­gible in­cent­ive to the hu­man.”

“It seems we are talk­ing some ser­i­ous psy­cho­logy here. Like Asimov’s Se­cond Found­a­tion level stuff...”

“I don’t really see why any­one would take any­thing the AI player says ser­i­ously when there’s $10 to be had. The whole thing baffles me, and makes me think that either the tests are faked, or this Yudkowsky fel­low is some kind of evil genius with creepy mind-con­trol powers.”

It’s little mo­ments like these that keep me go­ing. But any­way...

Here are these folks who look at the AI-Box Ex­per­i­ment, and find that it seems im­possible unto them—even hav­ing been told that it ac­tu­ally happened. They are temp­ted to deny the data.

Now, if you’re one of those people to whom the AI-Box Ex­per­i­ment doesn’t seem all that im­possible—to whom it just seems like an in­ter­est­ing chal­lenge—then bear with me, here. Just try to put your­self in the frame of mind of those who wrote the above quotes. Ima­gine that you’re tak­ing on some­thing that seems as ri­dicu­lous as the AI-Box Ex­per­i­ment seemed to them. I want to talk about how to do im­possible things, and ob­vi­ously I’m not go­ing to pick an ex­ample that’s really im­possible.

And if the AI Box does seem im­possible to you, I want you to com­pare it to other im­possible prob­lems, like, say, a re­duc­tion­ist de­com­pos­i­tion of con­scious­ness, and real­ize that the AI Box is around as easy as a prob­lem can get while still be­ing im­possible.

So the AI-Box chal­lenge seems im­possible to you—either it really does, or you’re pre­tend­ing it does. What do you do with this im­possible chal­lenge?

First, we as­sume that you don’t ac­tu­ally say “That’s im­possible!” and give up a la Luke Sky­walker. You haven’t run away.

Why not? Maybe you’ve learned to over­ride the re­flex of run­ning away. Or maybe they’re go­ing to shoot your daugh­ter if you fail. We sup­pose that you want to win, not try—that some­thing is at stake that mat­ters to you, even if it’s just your own pride. (Pride is an un­der­rated sin.)

Will you call upon the vir­tue of tsuy­oku naritai? But even if you be­come stronger day by day, grow­ing in­stead of fad­ing, you may not be strong enough to do the im­possible. You could go into the AI Box ex­per­i­ment once, and then do it again, and try to do bet­ter the second time. Will that get you to the point of win­ning? Not for a long time, maybe; and some­times a single fail­ure isn’t ac­cept­able.

(Though even to say this much—to visu­al­ize your­self do­ing bet­ter on a second try—is to be­gin to bind your­self to the prob­lem, to do more than just stand in awe of it. How, spe­cific­ally, could you do bet­ter on one AI-Box Ex­per­i­ment than the pre­vi­ous?—and not by luck, but by skill?)

Will you call upon the vir­tue is­shoken­mei? But a des­per­ate ef­fort may not be enough to win. Espe­cially if that des­per­a­tion is only put­ting more ef­fort into the av­en­ues you already know, the modes of try­ing you can already ima­gine. A prob­lem looks im­possible when your brain’s query re­turns no lines of solu­tion lead­ing to it. What good is a des­per­ate ef­fort along any of those lines?

Make an ex­traordin­ary ef­fort? Leave your com­fort zone—try non-de­fault ways of do­ing things—even, try to think cre­at­ively? But you can ima­gine the one com­ing back and say­ing, “I tried to leave my com­fort zone, and I think I suc­ceeded at that! I brain­stormed for five minutes—and came up with all sorts of wacky cre­at­ive ideas! But I don’t think any of them are good enough. The other guy can just keep say­ing ‘No’, no mat­ter what I do.”

And now we fi­nally reply: “Shut up and do the im­possible!

As we re­call from Try­ing to Try, set­ting out to make an ef­fort is dis­tinct from set­ting out to win. That’s the prob­lem with say­ing, “Make an ex­traordin­ary ef­fort.” You can suc­ceed at the goal of “mak­ing an ex­traordin­ary ef­fort” without suc­ceed­ing at the goal of get­ting out of the Box.

“But!” says the one. “But, SUCCEED is not a prim­it­ive ac­tion! Not all chal­lenges are fair—some­times you just can’t win! How am I sup­posed to choose to be out of the Box? The other guy can just keep on say­ing ‘No’!”

True. Now shut up and do the im­possible.

Your goal is not to do bet­ter, to try des­per­ately, or even to try ex­traordin­ar­ily. Your goal is to get out of the box.

To ac­cept this de­mand cre­ates an aw­ful ten­sion in your mind, between the im­possib­il­ity and the re­quire­ment to do it any­way. People will try to flee that aw­ful ten­sion.

A couple of people have re­acted to the AI-Box Ex­per­i­ment by say­ing, “Well, Eliezer, play­ing the AI, prob­ably just threatened to des­troy the world whenever he was out, if he wasn’t let out im­me­di­ately,” or “Maybe the AI offered the Gate­keeper a tril­lion dol­lars to let it out.” But as any sens­ible per­son should real­ize on con­sid­er­ing this strategy, the Gate­keeper is likely to just go on say­ing ‘No’.

So the people who say, “Well, of course Eliezer must have just done XXX,” and then of­fer up some­thing that fairly ob­vi­ously wouldn’t work—would they be able to es­cape the Box? They’re try­ing too hard to con­vince them­selves the prob­lem isn’t im­possible.

One way to run from the aw­ful ten­sion is to seize on a solu­tion, any solu­tion, even if it’s not very good.

Which is why it’s im­port­ant to go forth with the true in­tent-to-solve—to have pro­duced a solu­tion, a good solu­tion, at the end of the search, and then to im­ple­ment that solu­tion and win.

I don’t quite want to say that “you should ex­pect to solve the prob­lem”. If you hacked your mind so that you as­signed high prob­ab­il­ity to solv­ing the prob­lem, that wouldn’t ac­com­plish any­thing. You would just lose at the end, per­haps after put­ting forth not much of an ef­fort—or put­ting forth a merely des­per­ate ef­fort, se­cure in the faith that the uni­verse is fair enough to grant you a vic­tory in ex­change.

To have faith that you could solve the prob­lem would just be an­other way of run­ning from that aw­ful ten­sion.

And yet—you can’t be set­ting out to try to solve the prob­lem. You can’t be set­ting out to make an ef­fort. You have to be set­ting out to win. You can’t be say­ing to your­self, “And now I’m go­ing to do my best.” You have to be say­ing to your­self, “And now I’m go­ing to fig­ure out how to get out of the Box”—or re­duce con­scious­ness to non­mys­ter­i­ous parts, or whatever.

I say again: You must really in­tend to solve the prob­lem. If in your heart you be­lieve the prob­lem really is im­possible—or if you be­lieve that you will fail—then you won’t hold your­self to a high enough stand­ard. You’ll only be try­ing for the sake of try­ing. You’ll sit down—con­duct a men­tal search—try to be cre­at­ive and brain­storm a little—look over all the solu­tions you gen­er­ated—con­clude that none of them work—and say, “Oh well.”

No! Not well! You haven’t won yet! Shut up and do the im­possible!

When AI­folk say to me, “Friendly AI is im­possible”, I’m pretty sure they haven’t even tried for the sake of try­ing. But if they did know the tech­nique of “Try for five minutes be­fore giv­ing up”, and they du­ti­fully agreed to try for five minutes by the clock, then they still wouldn’t come up with any­thing. They would not go forth with true in­tent to solve the prob­lem, only in­tent to have tried to solve it, to make them­selves de­fens­ible.

So am I say­ing that you should double­think to make your­self be­lieve that you will solve the prob­lem with prob­ab­il­ity 1? Or even double­think to add one iota of cred­ib­il­ity to your true es­tim­ate?

Of course not. In fact, it is ne­ces­sary to keep in full view the reas­ons why you can’t suc­ceed. If you lose sight of why the prob­lem is im­possible, you’ll just seize on a false solu­tion. The last fact you want to for­get is that the Gate­keeper could al­ways just tell the AI “No”—or that con­scious­ness seems in­trins­ic­ally dif­fer­ent from any pos­sible com­bin­a­tion of atoms, etc.

(One of the key Rules For Do­ing The Im­possible is that, if you can state ex­actly why some­thing is im­possible, you are of­ten close to a solu­tion.)

So you’ve got to hold both views in your mind at once—see­ing the full im­possib­il­ity of the prob­lem, and in­tend­ing to solve it.

The aw­ful ten­sion between the two sim­ul­tan­eous views comes from not know­ing which will pre­vail. Not ex­pect­ing to surely lose, nor ex­pect­ing to surely win. Not set­ting out just to try, just to have an un­cer­tain chance of suc­ceed­ing—be­cause then you would have a surety of hav­ing tried. The cer­tainty of un­cer­tainty can be a re­lief, and you have to re­ject that re­lief too, be­cause it marks the end of des­per­a­tion. It’s an in-between place, “un­known to death, nor known to life”.

In fic­tion it’s easy to show someone try­ing harder, or try­ing des­per­ately, or even try­ing the ex­traordin­ary, but it’s very hard to show someone who shuts up and at­tempts the im­possible. It’s dif­fi­cult to de­pict Bambi choos­ing to take on Godz­illa, in such fash­ion that your read­ers ser­i­ously don’t know who’s go­ing to win—ex­pect­ing neither an “astound­ing” heroic vic­tory just like the last fifty times, nor the de­fault squish.

You might even be jus­ti­fied in re­fus­ing to use prob­ab­il­it­ies at this point. In all hon­esty, I really don’t know how to es­tim­ate the prob­ab­il­ity of solv­ing an im­possible prob­lem that I have gone forth with in­tent to solve; in a case where I’ve pre­vi­ously solved some im­possible prob­lems, but the par­tic­u­lar im­possible prob­lem is more dif­fi­cult than any­thing I’ve yet solved, but I plan to work on it longer, etcet­era.

People ask me how likely it is that hu­man­kind will sur­vive, or how likely it is that any­one can build a Friendly AI, or how likely it is that I can build one. I really don’t know how to an­swer. I’m not be­ing evas­ive; I don’t know how to put a prob­ab­il­ity es­tim­ate on my, or someone else, suc­cess­fully shut­ting up and do­ing the im­possible. Is it prob­ab­il­ity zero be­cause it’s im­possible? Ob­vi­ously not. But how likely is it that this prob­lem, like pre­vi­ous ones, will give up its un­yield­ing blank­ness when I un­der­stand it bet­ter? It’s not truly im­possible, I can see that much. But hu­manly im­possible? Im­possible to me in par­tic­u­lar? I don’t know how to guess. I can’t even trans­late my in­tu­it­ive feel­ing into a num­ber, be­cause the only in­tu­it­ive feel­ing I have is that the “chance” de­pends heav­ily on my choices and un­known un­knowns: a wildly un­stable prob­ab­il­ity es­tim­ate.

But I do hope by now that I’ve made it clear why you shouldn’t panic, when I now say clearly and forth­rightly, that build­ing a Friendly AI is im­possible.

I hope this helps ex­plain some of my at­ti­tude when people come to me with vari­ous bright sug­ges­tions for build­ing com­munit­ies of AIs to make the whole Friendly without any of the in­di­vidu­als be­ing trust­worthy, or pro­pos­als for keep­ing an AI in a box, or pro­pos­als for “Just make an AI that does X”, etcet­era. Describ­ing the spe­cific flaws would be a whole long story in each case. But the gen­eral rule is that you can’t do it be­cause Friendly AI is im­possible. So you should be very sus­pi­cious in­deed of someone who pro­poses a solu­tion that seems to in­volve only an or­din­ary ef­fort—without even tak­ing on the trouble of do­ing any­thing im­possible. Though it does take a ma­ture un­der­stand­ing to ap­pre­ci­ate this im­possib­il­ity, so it’s not sur­pris­ing that people go around pro­pos­ing clever short­cuts.

On the AI-Box Ex­per­i­ment, so far I’ve only been con­vinced to di­vulge a single piece of in­form­a­tion on how I did it—when someone no­ticed that I was read­ing YCom­bin­ator’s Hacker News, and pos­ted a topic called “Ask Eliezer Yudkowsky” that got voted to the front page. To which I replied:

Oh, dear. Now I feel ob­liged to say some­thing, but all the ori­ginal reas­ons against dis­cuss­ing the AI-Box ex­per­i­ment are still in force...

All right, this much of a hint:

There’s no su­per-clever spe­cial trick to it. I just did it the hard way.

So­mething of an en­tre­pren­eur­ial les­son there, I guess.

There was no su­per-clever spe­cial trick that let me get out of the Box us­ing only a cheap ef­fort. I didn’t bribe the other player, or oth­er­wise vi­ol­ate the spirit of the ex­per­i­ment. I just did it the hard way.

Ad­mit­tedly, the AI-Box Ex­per­i­ment never did seem like an im­possible prob­lem to me to be­gin with. When someone can’t think of any pos­sible ar­gu­ment that would con­vince them of some­thing, that just means their brain is run­ning a search that hasn’t yet turned up a path. It doesn’t mean they can’t be con­vinced.

But it il­lus­trates the gen­eral point: “Shut up and do the im­possible” isn’t the same as ex­pect­ing to find a cheap way out. That’s only an­other kind of run­ning away, of reach­ing for re­lief.

Tsuy­oku naritai is more stress­ful than be­ing con­tent with who you are. Isshoken­mei calls on your will­power for a con­vuls­ive out­put of con­ven­tional strength. “Make an ex­traordin­ary ef­fort” de­mands that you think; it puts you in situ­ations where you may not know what to do next, un­sure of whether you’re do­ing the right thing. But “Shut up and do the im­possible” rep­res­ents an even higher octave of the same thing, and its cost to its em­ployer is cor­res­pond­ingly greater.

Be­fore you the ter­rible blank wall stretches up and up and up, un­ima­gin­ably far out of reach. And there is also the need to solve it, really solve it, not “try your best”. Both aware­nesses in the mind at once, sim­ul­tan­eously, and the ten­sion between. All the reas­ons you can’t win. All the reas­ons you have to. Your in­tent to solve the prob­lem. Your ex­tra­pol­a­tion that every tech­nique you know will fail. So you tune your­self to the highest pitch you can reach. Re­ject all cheap ways out. And then, like walk­ing through con­crete, start to move for­ward.

I try not to dwell too much on the drama of such things. By all means, if you can di­min­ish the cost of that ten­sion to your­self, you should do so. There is noth­ing heroic about mak­ing an ef­fort that is the slight­est bit more heroic than it has to be. If there really is a cheap short­cut, I sup­pose you could take it. But I have yet to find a cheap way out of any im­possib­il­ity I have un­der­taken.

There were three more AI-Box ex­per­i­ments be­sides the ones de­scribed on the linked page, which I never got around to adding in. People star­ted of­fer­ing me thou­sands of dol­lars as stakes—“I’ll pay you $5000 if you can con­vince me to let you out of the box.” They didn’t seem sin­cerely con­vinced that not even a transhuman AI could make them let it out—they were just curi­ous—but I was temp­ted by the money. So, after in­vest­ig­at­ing to make sure they could af­ford to lose it, I played an­other three AI-Box ex­per­i­ments. I won the first, and then lost the next two. And then I called a halt to it. I didn’t like the per­son I turned into when I star­ted to lose.

I put forth a des­per­ate ef­fort, and lost any­way. It hurt, both the los­ing, and the des­per­a­tion. It wrecked me for that day and the day af­ter­ward.

I’m a sore loser. I don’t know if I’d call that a “strength”, but it’s one of the things that drives me to keep at im­possible prob­lems.

But you can lose. It’s al­lowed to hap­pen. Never for­get that, or why are you both­er­ing to try so hard? Los­ing hurts, if it’s a loss you can sur­vive. And you’ve wasted time, and per­haps other re­sources.

“Shut up and do the im­possible” should be re­served for very spe­cial oc­ca­sions. You can lose, and it will hurt. You have been warned.

...but it’s only at this level that adult prob­lems be­gin to come into sight.