[Question] AI Boxing for Hardware-bound agents (aka the China alignment problem)

I feel like I fit into a small group of peo­ple who are both:

1) Very con­fi­dent AI will be de­vel­oped in the next few decades

2) Not ter­ribly wor­ried about AI alignment

Most peo­ple around here seem to fall into ei­ther the camp “AI is always 50 years away” or “Oh my gosh Skynet is go­ing to kill us all!”.

This is my at­tempt to ex­plain why

A) I’m less wor­ried than a lot of peo­ple around here.

B) I think a lot of the AI al­ign­ment work is fol­low­ing a pretty silly re­search path.


Sorry this is so long.

The short ver­sion is: I be­lieve we will see a slow take­off in which AI is de­vel­oped si­mul­ta­neously in sev­eral places around the globe. This means we need to fo­cus on build­ing in­sti­tu­tions not soft­ware.

!!!Edit: Im­por­tant Clarification

I ap­par­ently did a poor job writ­ing this, since mul­ti­ple peo­ple have com­mented “Wow, you sure hate China!” or “Are you say­ing China tak­ing over the world is worse than a pa­per­clip max­i­mizer!?”

That is not what this post is about!

What this post is about is:

Sup­pose you were aware that an en­tity was soon to come into ex­is­tence which would be much more pow­er­ful than you are. Sup­pose fur­ther that you had limited faith in your abil­ity to in­fluence the goals and val­ues of that en­tity. How would you at­tempt to en­g­ineer the world so that nonethe­less af­ter the rise of that en­tity, your val­ues and ex­is­tence con­tinue to be pro­tected?

What this post is not about:

An AI perfectly al­igned with the in­ter­ests of the Chi­nese gov­ern­ment would not be worse than a pa­per­clip max­i­mizer (or your preferred bad out­come to the sin­gu­lar­ity). An AI perfectly al­igned with the con­sis­tently ex­trap­o­lated val­ues of the Chi­nese would prob­a­bly be pretty okay, since the Chi­nese are hu­man and share many of the same val­ues as I do. How­ever in a world with a slow take­off I think it is un­likely any sin­gle AI will dom­i­nate, much less one that hap­pens to perfectly ex­trap­o­late the val­ues of any one group or in­di­vi­d­ual.

Foom, the clas­sic AI-risk scenario

Gen­er­ally speak­ing the AI risk crowd tells a very sim­ple story that goes like this:

1) Even­tu­ally AI will be de­vel­oped ca­pa­ble of do­ing all hu­man-like in­tel­lec­tual tasks

2) Im­prov­ing AI is one of these tasks

3) Goto 1

The claim is that “Any AI worth its salt will be ca­pa­ble of writ­ing an even bet­ter AI, which will be ca­pa­ble of build­ing an even bet­ter AI, which will be ca­pa­ble of....” so within (hours? min­utes? days?) AI will have gone from hu­man-level to galac­ti­cly be­yond any­thing hu­man­ity is ca­pa­ble of do­ing.

I pro­pose an al­ter­na­tive hy­poth­e­sis:

By the time hu­man-level AI is achieved, most of the low-hang­ing fruit in the AI im­prove­ment do­main will have already been found, so sub­se­quent im­prove­ments in AI ca­pa­bil­ity will re­quire a su­per­hu­man level of in­tel­li­gence. The first hu­man-level AI will be no more ca­pa­ble of re­cur­sive-self-im­prove­ment than the first hu­man was.

Note: This does not mean that re­cur­sive self-im­prove­ment is a thing that is go­ing to stop hap­pen­ing, or that the de­vel­op­ment of hu­man-level AI will not have profound eco­nomic, sci­en­tific and philo­soph­i­cal con­se­quences. What this means is, the first AI is go­ing to take some se­ri­ous time and com­pute power to out-com­pete 200 plus years worth of hu­man effort on de­vel­op­ing ma­chines that think.

What the first AI looks like in each of these sce­nar­ios:

Foom: One day, some hacker in his mom’s base­ment writes an al­gorithm for a re­cur­sively self-im­prov­ing AI. Ten min­utes later, this AI has con­quered the world and con­verted Mars into paperclips

Moof: One day, af­ter a 5 years of ar­du­ous effort, Google fi­nally finishes train­ing the first hu­man-level AI. Its in­tel­li­gence is ap­prox­i­mately that of a 5-year-old child. Its first pub­li­cly ut­tered sen­tence is “Mama, I want to watch Paw Pa­trol!” A few years later, any­body can “sum­mon” a vir­tual as­sis­tant with hu­man level in­tel­li­gence from their phone to do their bid­ding. But peo­ple have been us­ing vir­tual AI as­sis­tants on their phone since the mid 2010′s, so no­body is nearly as shocked as a time-trav­eler from the year 2000 would be.

What is the key differ­ence be­tween these sce­nar­ios? (Soft­ware vs Hard­ware bound AI)

In the Foom sce­nario, the key limit­ing re­source or bot­tle­neck was the ex­is­tence of the cor­rect al­gorithm. Once this al­gorithm was found, the AI was able to edit its own source-code, lead­ing to dra­matic re­cur­sive self-im­prove­ment.

In the Moof sce­nario, the key limit­ing re­sources were hard­ware and “train­ing effort”. Build­ing the first AI re­quired mas­sively more com­pute power and train­ing data than run­ning the first AI, and also mas­sively more than the first AI had ac­cess to.

Does this mean that the de­vel­op­ment of hu­man-level AI might not sur­prise us? Or that by the time hu­man level AI is de­vel­oped it will already be old news? I don’t know. That de­pends on whether or not you were sur­prised by the de­vel­op­ment of Alpha-Go.

If, on the one hand, you had seen that since the 1950′s com­puter AIs had been ca­pa­ble of beat­ing hu­mans in­creas­ingly difficult games and that progress in this do­main had been fairly steady and mostly limited by com­pute power. And more­over that com­puter Go pro­grams had them­selves gone from idiotic to high-am­a­teur level over a course of decades, then the de­vel­op­ment of alpha-go (if not the ex­act timing of that de­vel­op­ment) prob­a­bly seemed in­evitable.

If, on the other hand, you thought that play­ing Go was a uniquely hu­man skill that re­quired the abil­ity to think cre­atively which ma­chines could never ever repli­cate, then Alpha Go prob­a­bly sur­prised you.

For the record, I was sur­prised at how soon Alpha-Go hap­pened, but not that it hap­pened.

What ar­gu­ments are there in fa­vor of (or against) Hard­ware Bound AI?

The strongest ar­gu­ment in fa­vor of hard­ware-bound AI is that in ar­eas of in­tense hu­man in­ter­est, the key “break­throughs” tend to found by mul­ti­ple peo­ple in­de­pen­dently, sug­gest­ing they are a re­sult of con­di­tions be­ing cor­rect rather than the ex­is­tence of a lone ge­nius.

Con­sider: Writ­ing was in­de­pen­dently in­vented at a min­i­mum in China, South Amer­ica, and the mid­dle-east. Calcu­lus was de­vel­oped by both New­ton and Leib­nez. There are half a dozen peo­ple who claim to have beaten the Wright broth­ers for the first pow­ered flight. Ar­tifi­cial neu­ral net­works had been a topic of re­search for 50 years be­fore the deep-learn­ing rev­olu­tion.

The strongest ar­gu­ment against Hard­ware Bound AI (and in fa­vor of Foom) is that we do not cur­rently know the al­gorithm that will be used to de­velop a hu­man level in­tel­li­gence. This leaves open the pos­si­bil­ity that a soft­ware break­through will lead to rapid progress.

How­ever, I ar­gue that not only will the “cor­rect al­gorithm” be known well in ad­vance of the de­vel­op­ment of hu­man-level AI, but it will be widely de­ployed as well. I say this be­cause we have ev­ery rea­son to be­lieve that the al­gorithm that hu­man-level AI in hu­mans is the same al­gorithm that pro­duces chim­panzee-level AI in chimps, dog-level AI in dogs and mouse-level AI in mice, if not cock­roach-level AI in cock­roaches. The evolu­tion­ary changes from chim­panzee to hu­man were largely of scale and func­tion, not some rev­olu­tion­ary new brain ar­chi­tec­ture.

Why should we ex­pect dog-AI or chimp AI to be de­vel­oped be­fore hu­man-AI? Be­cause they will be use­ful and be­cause con­sid­er­able eco­nomic gain will go to their de­vel­oper. Imag­ine an AI that could be trained as eas­ily as a dog, but whose train­ing could then be in­stantly copied to mil­lions of “dogs” around the planet.

Fur­ther­more, once dog-AI is de­vel­oped, billions of dol­lars of re­search and in­vest­ment will be spent im­prov­ing it to make sure its soft­ware and hard­ware run as effi­ciently as pos­si­ble. Con­sider the mas­sive effort that has gone into the de­vel­op­ment of soft­ware like Ten­sorFlow or Google’s TPU’s. If there were a “trick” that would make dog-AI even 2x as pow­er­ful (or en­ergy effi­cient), re­searchers would be ea­ger to find it.

What does this mean for AI al­ign­ment? (Or, what is the China Align­ment prob­lem?)

Does the be­lief in hard­ware-bound AI mean that AI al­ign­ment doesn’t mat­ter, or that the de­vel­op­ment of hu­man-level AI will be a rel­a­tive non-event?


Rather, it means that when think­ing about AI risk, we should think of AI less as a sin­gle piece of soft­ware and more as a com­ing eco­nomic shift that will be wide­spread and un­stop­pable well be­fore it offi­cially “hap­pens”.

Sup­pose, liv­ing in the USA in the early 1990′s, you were aware that there was a na­tion called China with the po­ten­tial to be vastly more eco­nom­i­cally pow­er­ful than the USA and whose ideals were vastly differ­ent from your own. Sup­pose, fur­ther, that rather than try­ing to stop the “rise” of China, you be­lieved that de­vel­op­ing China’s vast eco­nomic and in­tel­lec­tual po­ten­tial could be a great boon for hu­mankind (and for the Chi­nese them­selves).

How would you go about try­ing to “con­tain” China’s rise? That is, how would you make sure that at what­ever mo­ment China’s power sur­passed your own, you would face a benev­olent rather than a hos­tile op­po­nent.

Well, you would prob­a­bly do some game the­ory. If you could con­vince the Chi­nese that benev­olence was in their own best in­ter­est while they were still less-pow­er­ful than you, per­haps you would have a chance of in­fluenc­ing their ide­ol­ogy be­fore they be­came a threat. At the very least your goals would be the fol­low­ing:

1) Non-ag­gres­sion. You should make it perfectly clear to the Chi­nese that force will be met with over­whelming force and should they show hos­tility, they will suffer.

2) Pos­i­tive-sum games. You should en­gage China in mu­tual-eco­nomic gain, so that they re­al­ize that peace­ful in­ter­course with you is bet­ter than the al­ter­na­tive.

3) Global in­sti­tu­tions. You should es­tab­lish a se­ries of global in­sti­tu­tions that en­shrine the val­ues you hold most dear (hu­man rights, free­dom of speech) and make clear that only en­tities that re­spect these val­ues (at least on an in­ter­na­tional stage) will be wel­comed to the “club” of de­vel­oped na­tions.

Con­trast this with tra­di­tional AI al­ign­ment, which is fo­cused on de­vel­op­ing the “right soft­ware” so that the first hu­man-level AI will have the same core val­ues as hu­man be­ings. Not only does this re­quire you to have a perfect de­scrip­tion of hu­man val­ues, you must also figure out how to en­code those val­ues in a re­cur­sively self-im­prov­ing pro­gram, and make sure that your soft­ware is the first to achieve Foom. If any­one any­where de­vel­ops an AI based off of soft­ware that is not perfect be­fore you, we’re all doomed.

AI Box­ing Strate­gies for Hard­ware Bound AI

AI box­ing is ac­tu­ally very easy for Hard­ware Bound AI. You put the AI in­side of an air-gapped fire­wall and make sure it doesn’t have enough com­pute power to in­vent some novel form of trans­mis­sion that isn’t known to all of sci­ence. Since there is a con­sid­er­able com­pu­ta­tional gap be­tween use­ful AI and “all of sci­ence”, you can do quite a bit with an AI in a box with­out wor­ry­ing too much about it go­ing rogue.

Un­for­tu­nately, AI box­ing is also a bit of a lost cause. It’s fine if your AI is nicely con­tained in a box. How­ever, your com­peti­tor in In­dia has been de­ploy­ing AI on the in­ter­net do­ing fi­nan­cial trad­ing for a decade already. An AI that is pro­grammed to make as much money as pos­si­ble trad­ing stocks and is al­lowed to buy more hard­ware to do so has all of the means, mo­tive, and op­por­tu­nity to be a threat to hu­mankind.

The only vi­able strat­egy is to make sure that you have a pile of hard­ware of your own that you can use to com­pete eco­nom­i­cally be­fore get­ting swamped by the other guy. The safest path isn’t to limit AI risk as much as pos­si­ble, but rather to make sure that agents you have friendly eco­nomic re­la­tions with rise as quickly as pos­si­ble.

What re­search can I per­son­ally in­vest in to max­i­mize AI safety?

If the biggest threat from AI doesn’t come from AI Foom, but rather from Chi­nese-owned AI with a hos­tile world-view. And if, like me, you con­sider the liberal val­ues held by the Western world some­thing worth sav­ing, then the sin­gle best course of ac­tion you can take is to make sure those liberal val­ues have a place in the com­ing AI-dom­i­nated econ­omy.

This means:

1) Mak­ing sure that liberal west­ern democ­ra­cies con­tinue to stay on the cut­ting-edge of AI de­vel­op­ment.

2) En­sur­ing that global in­sti­tu­tions such as the UN and WTO con­tinue to em­brace and ad­vance ideals such as free-trade and hu­man-rights.

Keep­ing the West ahead

Ad­vanc­ing AI re­search is ac­tu­ally one of the best things you can do to en­sure a “peace­ful rise” of AI in the fu­ture. The sooner we dis­cover the core al­gorithms be­hind in­tel­li­gence, the more time we will have to pre­pare for the com­ing rev­olu­tion. The worst-case sce­nario still is that some time in the mid 2030′s a sin­gle re­search team comes up with a rev­olu­tion­ary new soft­ware that puts them miles ahead of any­one else. The more evenly dis­tributed AI re­search is, the more mu­tu­ally benefi­cial eco­nomic games will en­sure the peace­ful rise of AI.

I ac­tu­ally think there is con­sid­er­able work that can be done right now to de­velop hu­man-level AI. While I don’t think that Moore’s law has yet reached the level re­quired to de­velop hu­man AI, I be­lieve we’re ap­proach­ing “dog-level” and we are un­doubt­edly well be­yond “cock­roach level”. Se­ri­ous work on de­vel­op­ing sub-hu­man AI not only ad­vances the cause of AI safety, but will also provide enor­mous eco­nomic benefits to all of us liv­ing here on earth.

Per­son­ally, I think one fruit­ful area in the next few years will be the com­bi­na­tion of deep-learn­ing with “clas­si­cal AI” to de­velop mod­els that can make novel in­fer­ences and ex­hibit “one shot” or “few shot” learn­ing. The com­bi­na­tion of a clas­sic method (Alpha–beta prun­ing) and deep learn­ing is what made alpha-go so pow­er­ful.

Imag­ine an AI that was ca­pa­ble of mak­ing gen­eral in­fer­ences about the world, where the in­fer­ences them­selves were about fuzzy cat­e­gories ex­tracted through deep learn­ing and self-play. For ex­am­ple it might learn “all birds have wings”, where “bird” and “wing” re­fer to differ­ent ac­ti­va­tions in a deep learn­ing net­work but the sen­tence “all birds have wings” is en­coded in a ex­pert-sys­tem like col­lec­tion of facts. The sys­tem would then pro­gres­sively ex­pand and cu­rate its set of facts, keep­ing the ones that were most use­ful for mak­ing pre­dic­tions about the real world. Such a sys­tem could be trained on a youtube-scale video cor­pus, or on a simu­lated en­vi­ron­ment such as Skyrim or Minecraft.

Build­ing institutions

In ad­di­tion to mak­ing sure that AI isn’t de­vel­oped first by an or­ga­ni­za­tion hos­tile to Western liberal val­ues, we also need to make sure that when AI is de­vel­oped, it is born into a world that en­courages its peace­ful de­vel­op­ment. This means pro­mot­ing norms of liberty, free trade and pro­tec­tion of per­sonal prop­erty. In a world with mul­ti­ple ac­tors trad­ing freely, the op­ti­mal strat­egy is one of trade and co­op­er­a­tion. Violence will only be met with coun­ter­vailing force.

This means we need to strengthen our in­sti­tu­tions as well as our al­li­ances. The more we can en­shrine prin­ci­ples of liberty in the ba­sic in­fras­truc­ture of our so­ciety, the more likely they will sur­vive. This means build­ing an in­ter­net and fi­nan­cial net­work that re­sists surveillance and cen­sor­ship. Cur­rently blockchain is the best plat­form I am aware of for this.

This also means de­vel­op­ing global norms in which vi­o­lence is met with col­lec­tive ac­tion against the ag­gres­sor. When Rus­sia in­vades Ukraine or China in­vades Taiwan, the world can­not sim­ply turn a blind eye. Tit-for-tat like strate­gies can en­courage the evolu­tion of pro-so­cial or at least ra­tio­nal AI en­tities.

Fi­nally, we need to make sure that Western liberal democ­racy sur­vives long enough to hand off the reins to AI. This means we need to se­ri­ously ad­dress prob­lems like sec­u­lar stag­na­tion, cli­mate change, and eco­nomic in­equal­ity.

When will hu­man-level AI be de­vel­oped?

I largely agree with Me­tac­u­lus that it will hap­pen some­time be­tween 2030 and 2060. I ex­pect that we will see some pretty amaz­ing break­throughs (dog-level AI) in the next few years. One group whose po­ten­tial I think is slightly un­ap­pre­ci­ated is Tesla. They have both a need (self-driv­ing) and the means (video data from mil­lions of cars) to make a huge break­through here. Google, Ama­zon, and who­ever is build­ing the surveillance state in China are also ob­vi­ous places to watch.

One im­por­tant idea is that of AI fire-alarms. Mine per­son­ally was Alpha-Go, which caused me to up­date from “even­tu­ally” to “soon”. The next fire-alarm will be an AI that can re­act to a novel en­vi­ron­ment with a hu­man-like amount of train­ing data. Imag­ine an AI that can learn to play Su­per Mario in only a few hours of game­play, or an AI that can learn a new card game just by play­ing with a group of hu­mans for a few hours. When this hap­pens, I will up­date from “soon” to “very soon”.

What are your cre­dences? (How much would you be will­ing to bet?)

Foom vs Moof:

I think this is a bit of a sucker bet, since if Foom hap­pens we’re (prob­a­bly) all dead. But I would be will­ing to bet at least 20:1 against Foom. Forms this bet might take are “Will the first hu­man-level AI be trained on hard­ware cost­ing more or less than $1 mil­lion (in­fla­tion ad­justed)?”

When will AGI hap­pen?

I would be will­ing to take a bet at 1:1 odds that hu­man-level AI will not hap­pen be­fore 2030.

I will not take a fair bet that hu­man-level AI will hap­pen be­fore 2060, since it’s pos­si­ble that Moore’s law will break down in some way I can not pre­dict. I might take such a bet at 1:3 odds.


I will take a bet at 10:1 odds that hu­man-level AI will be de­vel­oped be­fore we have a work­ing ex­am­ple of “al­igned AI”, that is an AI al­gorithm that prov­ably in­cor­po­rates hu­man val­ues in a way that is ro­bust against re­cur­sive self-im­prove­ment.

Pos­i­tive out­come to the sin­gu­lar­ity:

This is even more of a sucker bet than Foom vs Moof. How­ever, my be­lief is closer to 1:1 than it is to 100:1, since I think there is a real dan­ger that a hos­tile power such as China de­vel­ops AI be­fore us, or that we haven’t de­vel­oped suffi­ciently ro­bust in­sti­tu­tions to sur­vive the dra­matic eco­nomic up­heaval that hu­man-level AI will pro­duce.

Tesla vs Google:

I would be will­ing to bet 5:1 that Tesla will pro­duce a mass-mar­ket self-driv­ing car be­fore Google.