A Premature Word on AI

Fol­lowup to: A.I. Old-Timers, Do Scien­tists Already Know This Stuff?

In re­sponse to Robin Han­son’s post on the dis­illu­sion­ment of old-time AI re­searchers such as Roger Schank, I thought I’d post a few pre­ma­ture words on AI, even though I’m not re­ally ready to do so:


I never ex­pected AI to be easy. I went into the AI field be­cause I thought it was world-crack­ingly im­por­tant, and I was will­ing to work on it if it took the rest of my whole life, even though it looked in­cred­ibly difficult.

I’ve no­ticed that folks who ac­tively work on Ar­tifi­cial Gen­eral In­tel­li­gence, seem to have started out think­ing the prob­lem was much eas­ier than it first ap­peared to me.

In ret­ro­spect, if I had not thought that the AGI prob­lem was worth a hun­dred and fifty thou­sand hu­man lives per day—that’s what I thought in the be­gin­ning—then I would not have challenged it; I would have run away and hid like a scared rab­bit. Every­thing I now know about how to not panic in the face of difficult prob­lems, I learned from tack­ling AGI, and later, the su­per­prob­lem of Friendly AI, be­cause run­ning away wasn’t an op­tion.

Try tel­ling one of these AGI folks about Friendly AI, and they reel back, sur­prised, and im­me­di­ately say, “But that would be too difficult!” In short, they have the same run-away re­flex as any­one else, but AGI has not ac­ti­vated it. (FAI does.)

Roger Schank is not nec­es­sar­ily in this class, please note. Most of the peo­ple cur­rently wan­der­ing around in the AGI Dun­geon are those too blind to see the warn­ing signs, the skulls on spikes, the flam­ing pits. But e.g. John McCarthy is a war­rior of a differ­ent sort; he ven­tured into the AI Dun­geon be­fore it was known to be difficult. I find that in terms of raw formidabil­ity, the war­riors who first stum­bled across the Dun­geon, im­press me rather more than most of the mod­ern ex­plor­ers—the first ex­plor­ers were not self-se­lected for folly. But alas, their weapons tend to be ex­tremely ob­so­lete.

There are many ways to run away from difficult prob­lems. Some of them are ex­ceed­ingly sub­tle.

What makes a prob­lem seem im­pos­si­ble? That no av­enue of suc­cess is available to mind. What makes a prob­lem seem scary? That you don’t know what to do next.

Let’s say that the prob­lem of cre­at­ing a gen­eral in­tel­li­gence seems scary, be­cause you have no idea how to do it.You could run away by work­ing on chess-play­ing pro­grams in­stead. Or you could run away by say­ing, “All past AI pro­jects failed due to lack of com­put­ing power.” Then you don’t have to face the un­pleas­ant prospect of star­ing at a blank piece of pa­per un­til drops of blood form on your fore­head—the best de­scrip­tion I’ve ever heard of the pro­cess of search­ing for core in­sight. You have avoided plac­ing your­self into a con­di­tion where your daily work may con­sist of not know­ing what to do next.

But “Com­put­ing power!” is a mys­te­ri­ous an­swer to a mys­te­ri­ous ques­tion. Even af­ter you be­lieve that all past AI pro­jects failed “due to lack of com­put­ing power”, it doesn’t make in­tel­li­gence any less mys­te­ri­ous. “What do you mean?” you say in­dig­nantly, “I have a perfectly good ex­pla­na­tion for in­tel­li­gence: it emerges from lots of com­put­ing power! Or knowl­edge! Or com­plex­ity!” And this is a sub­tle is­sue to which I must prob­a­bly de­vote more posts. But if you con­trast the rush of in­sight into de­tails and speci­fics that fol­lows from learn­ing about, say, Pear­lian causal­ity, you may re­al­ize that “Com­put­ing power causes in­tel­li­gence” does not con­strain de­tailed an­ti­ci­pa­tion of phe­nom­ena even in ret­ro­spect.

Peo­ple are not sys­tem­at­i­cally taught what to do when they’re scared; ev­ery­one’s got to work it out on their own. And so the vast ma­jor­ity stum­ble into sim­ple traps like mys­te­ri­ous an­swers or af­fec­tive death spirals. I too stum­bled, but I man­aged to re­cover and get out al­ive; and re­al­ized what it was that I’d learned; and then I went back into the Dun­geon, be­cause I had some­thing to pro­tect.

I’ve re­cently dis­cussed how sci­en­tists are not taught to han­dle chaos, so I’m em­pha­siz­ing that as­pect in this par­tic­u­lar post, as op­posed to a dozen other as­pects… If you want to ap­pre­ci­ate the in­fer­en­tial dis­tances here, think of how odd all this would sound with­out the Ein­stein se­quence. Then think of how odd the Ein­stein se­quence would have sounded with­out the many-wor­lds se­quence… There’s plenty more where that came from.

What does progress in AGI/​FAI look like, if not big­ger and faster com­put­ers?

It looks like tak­ing down the real bar­rier, the scary bar­rier, the one where you have to sweat blood: un­der­stand­ing things that seem mys­te­ri­ous, and not by declar­ing that they’re “emer­gent” or “com­plex”, ei­ther.

If you don’t un­der­stand the fam­ily of Cox’s The­o­rems and the Dutch Book ar­gu­ment, you can go round and round with “cer­tainty fac­tors” and “fuzzy log­ics” that seem sorta ap­peal­ing, but that can never quite be made to work right. Once you un­der­stand the struc­ture of prob­a­bil­ity—not just prob­a­bil­ity as an ex­plicit tool, but as a forced im­plicit struc­ture in cog­ni­tive en­g­ines—even if the struc­ture is only ap­prox­i­mate—then you be­gin to ac­tu­ally un­der­stand what you’re do­ing; you are not just try­ing things that seem like good ideas. You have achieved core in­sight. You are not even limited to float­ing-point num­bers be­tween 0 and 1 to rep­re­sent prob­a­bil­ity; you have seen through to struc­ture, and can use log odds or smoke sig­nals if you wish.

If you don’t un­der­stand graph­i­cal mod­els of con­di­tional in­de­pen­dence, you can go round and round in­vent­ing new “de­fault log­ics” and “defea­si­ble log­ics” that get more and more com­pli­cated as you try to in­cor­po­rate an in­finite num­ber of spe­cial cases. If you know the graph­i­cal struc­ture, and why the graph­i­cal model works, and the reg­u­lar­ity of the en­vi­ron­ment that it ex­ploits, and why it is effi­cient as well as cor­rect, then you re­ally un­der­stand the prob­lem; you are not limited to ex­plicit Bayesian net­works, you just know that you have to ex­ploit a cer­tain kind of math­e­mat­i­cal reg­u­lar­ity in the en­vi­ron­ment.

Un­for­tu­nately, these two in­sights—Bayesian prob­a­bil­ity and Pear­lian causal­ity—are far from suffi­cient to solve gen­eral AI prob­lems. If you try to do any­thing with these two the­o­ries that re­quires an ad­di­tional key in­sight you do not yet pos­sess, you will fail just like any other AGI pro­ject, and build some­thing that grows more and more com­pli­cated and patch­worky but never quite seems to work the way you hoped.

Th­ese two in­sights are ex­am­ples of what “progress in AI” looks like.

Most peo­ple who say they in­tend to tackle AGI do not un­der­stand Bayes or Pearl. Most of the peo­ple in the AI Dun­geon are there be­cause they think they found the Sword of Truth in an old well, or, even worse, be­cause they don’t re­al­ize the prob­lem is difficult. They are not poly­maths; they are not mak­ing a con­vul­sive des­per­ate effort to solve the un­solv­able. They are op­ti­mists who have their Great Idea that is the best idea ever even though they can’t say ex­actly how it will pro­duce in­tel­li­gence, and they want to do the sci­en­tific thing and test their hy­poth­e­sis. If they hadn’t started out think­ing they already had the Great Idea, they would have run away from the Dun­geon; but this does not give them much of a mo­tive to search for other mas­ter keys, even the ones already found.

The idea of look­ing for an “ad­di­tional in­sight you don’t already have” is some­thing that the aca­demic field of AI is just not set up to do. As a strat­egy, it does not re­sult in a re­li­able suc­cess (defined as a re­li­able pub­li­ca­tion). As a strat­egy, it re­quires ad­di­tional study and large ex­pen­di­tures of time. It ul­ti­mately amounts to “try to be Judea Pearl or Laplace” and that is not some­thing that pro­fes­sors have been re­li­ably taught to teach un­der­grad­u­ates; even though it is of­ten what a field in a state of sci­en­tific chaos needs.

John McCarthy said quite well what Ar­tifi­cial In­tel­li­gence needs: 1.7 Ein­steins, 2 Maxwells, 5 Fara­days and .3 Man­hat­tan Pro­jects. From this I am forced to sub­tract the “Man­hat­tan pro­ject”, be­cause se­cu­rity con­sid­er­a­tions of FAI pro­hibit us­ing that many peo­ple; but I doubt it’ll take more than an­other 1.5 Maxwells and 0.2 Fara­days to make up for it.

But, as said, the field of AI is not set up to sup­port this—it is set up to sup­port ex­plo­ra­tions with re­li­able pay­offs.

You would think that there would be gen­uinely formidable peo­ple go­ing into the Dun­geon of Gen­er­al­ity, nonethe­less, be­cause they wanted to test their skills against true sci­en­tific chaos. Even if they hadn’t yet re­al­ized that their lit­tle sister is down there. Well, that sounds very at­trac­tive in prin­ci­ple, but I guess it sounds a lot less at­trac­tive when you have to pay the rent. Or they’re all off do­ing string the­ory, be­cause AI is well-known to be im­pos­si­ble, not the sort of chaos that looks promis­ing—why, it’s gen­uinely scary! You might not suc­ceed, if you went in there!

But I digress. This be­gan as a re­sponse to Robin Han­son’s post “A.I. Old-Timers”, and Roger Shank’s very differ­ent idea of what fu­ture AI progress will look like.

Okay, let’s take a look at Roger Schank’s ar­gu­ment:

I have not soured on AI. I still be­lieve that we can cre­ate very in­tel­li­gent ma­chines. But I no longer be­lieve that those ma­chines will be like us… What AI can and should build are in­tel­li­gent spe­cial pur­pose en­tities. (We can call them Spe­cial­ized In­tel­li­gences or SI’s.) Smart com­put­ers will in­deed be cre­ated. But they will ar­rive in the form of SI’s, ones that make lousy com­pan­ions but know ev­ery ship­ping ac­ci­dent that ever hap­pened and why (the ship­ping in­dus­try’s SI) or as an ex­pert on sales (a busi­ness world SI.)

I ask the fun­da­men­tal ques­tion of ra­tio­nal­ity: Why do you be­lieve what you be­lieve?

Schank would seem to be talk­ing as if he knows some­thing about the course of fu­ture AI re­search—re­search that hasn’t hap­pened yet. What is it that he thinks he knows? How does he think he knows it?

As John McCarthy said: “Your state­ments amount to say­ing that if AI is pos­si­ble, it should be easy. Why is that?”

There is a mas­ter strength be­hind all hu­man arts: Hu­man in­tel­li­gence can, with­out ad­di­tional adap­ta­tion, cre­ate the spe­cial-pur­pose sys­tems of a skyscraper, a gun, a space shut­tle, a nu­clear weapon, a DNA syn­the­sizer, a high-speed com­puter...

If none of what the hu­man brain does is magic, the com­bined trick of it can be recre­ated in purer form.

If this can be done, some­one will do it. The fact that ship­ping-in­ven­tory pro­grams can be built as well, does not mean that it is sen­si­ble to talk about peo­ple only build­ing ship­ping-in­ven­tory pro­grams. If it is also pos­si­ble to build some­thing of hu­man+ power. In a world where both events oc­cur, the course of his­tory is dom­i­nated by the lat­ter.

So what is it that Roger Schank learned, as Bayesian ev­i­dence, which con­firms some spe­cific hy­poth­e­sis over its al­ter­na­tives—and what is the hy­poth­e­sis, ex­actly? - that re­veals to him the fu­ture course of AI re­search? Namely, that AI will not suc­ceed in cre­at­ing any­thing of gen­eral ca­pa­bil­ity?

It would seem rather difficult to pre­dict the fu­ture course of re­search you have not yet done. Wouldn’t Schank have to know the best solu­tion in or­der to know the min­i­mum time the best solu­tion would take?

Of course I don’t think Schank is ac­tu­ally do­ing a Bayesian up­date here. I think Roger Schank gives the game away when he says:

When re­porters in­ter­viewed me in the 70′s and 80′s about the pos­si­bil­ities for Ar­tifi­cial In­tel­li­gence I would always say that we would have ma­chines that are as smart as we are within my life­time. It seemed a safe an­swer since no one could ever tell me I was wrong.

There is care­ful fu­tur­ism, where you try to con­sider all the bi­ases you know, and sep­a­rate your anal­y­sis into log­i­cal parts, and put con­fi­dence in­ter­vals around things, and use wider con­fi­dence in­ter­vals where you have less con­strain­ing knowl­edge, and all that other stuff ra­tio­nal­ists do. Then there is sloppy fu­tur­ism, where you just make some­thing up that sounds neat. This sounds like sloppy fu­tur­ism to me.

So, ba­si­cally, Schank made a fan­tas­tic amaz­ing fu­tur­is­tic pre­dic­tion about ma­chines “as smart as we are” “within my life­time”—two phrases that them­selves re­veal some shaky as­sump­tions.

Then Schank got all sad and dis­ap­pointed be­cause he wasn’t mak­ing progress as fast as he hoped.

So Schank made a differ­ent fu­tur­is­tic pre­dic­tion, about spe­cial-pur­pose AIs that will an­swer your ques­tions about ship­ping dis­asters. It wasn’t quite as shiny and fu­tur­is­tic, but it matched his new sad­dened mood, and it gave him some­thing to say to re­porters when they asked him where AI would be in 2050.

This is how the vast ma­jor­ity of fu­tur­ism is done. So un­til I have rea­son to be­lieve there is some­thing more to Schank’s anal­y­sis than this, I don’t feel very guilty about dis­agree­ing with him when I make “pre­dic­tions” like:

If you don’t know much about a prob­lem, you should widen your con­fi­dence in­ter­vals in both di­rec­tions. AI seems very hard be­cause you don’t know how to do it. But trans­lat­ing this into a con­fi­dent pre­dic­tion of a very long time in­ter­val would ex­press your ig­no­rance as if it were pos­i­tive knowl­edge. So even though AI feels very hard to you, this is an ex­pres­sion of ig­no­rance that should trans­late into a con­fi­dence in­ter­val wide in both di­rec­tions: the less you know, the broader that con­fi­dence in­ter­val should be, in both di­rec­tions.


You don’t know what the­o­ret­i­cal in­sights will be re­quired for AI, or you would already have them. The­o­ret­i­cal break­throughs can hap­pen with­out ad­vance warn­ing (the warn­ing is per­ceived in ret­ro­spect, of course, but not in ad­vance); and they can be ar­bi­trar­ily large. We know it is difficult to build a star from hy­dro­gen atoms in the ob­vi­ous way—be­cause we un­der­stand how stars work, so we know that the work re­quired is a huge amount of drudgery.


Look­ing at the an­thro­polog­i­cal tra­jec­tory of ho­minids seems to strongly con­tra­dict the as­ser­tion that ex­po­nen­tially in­creas­ing amounts of pro­cess­ing power or pro­gram­ming time are re­quired for the pro­duc­tion of in­tel­li­gence in the vicinity of hu­man; even when us­ing an evolu­tion­ary al­gorithm that runs on blind mu­ta­tions, ran­dom re­com­bi­na­tion, and se­lec­tion with zero fore­sight.

But if I don’t want this post to go on for­ever, I had bet­ter stop it here. See this pa­per, how­ever.