Bounding the impact of AGI

For those of you in­ter­ested, An­drás Kor­nai’s pa­per “Bound­ing the im­pact of AGI” from this year’s AGI-Im­pacts con­fer­ence at Oxford had a few in­ter­est­ing ideas (which I’ve ex­cerpted be­low).

Sum­mary:

  1. Ac­cept­able risk tol­er­ances for AGI de­sign can be de­ter­mined us­ing stan­dard safety en­g­ineer­ing tech­niques from other fields

  2. Math­e­mat­i­cal proof is the only available tool to se­cure the tol­er­ances re­quired to pre­vent in­tol­er­able in­creases in xrisk

  3. Au­to­mated the­o­rem prov­ing will be re­quired so that the proof can rea­son­ably be checked by mul­ti­ple hu­man minds

Safety engineering

Since the origi­nal ap­proach of Yud­kowsky (2006) to friendly AI, which sought math­e­mat­i­cal guaran­tees of friendli­ness, was met with con­sid­er­able skep­ti­cism, we re­visit the is­sue of why such guaran­tees are es­sen­tial. In de­sign­ing ra­dioac­tive equip­ment, a rea­son­able guideline is to limit emis­sions to sev­eral or­ders of mag­ni­tude be­low the nat­u­ral back­ground ra­di­a­tion level, so that hu­man-caused dan­gers are lost in the noise com­pared to the pre-ex­ist­ing threat we must live with any­way. In the full pa­per, we take the “big five” ex­tinc­tion events that oc­curred within the past half billion years as back­ground, and ar­gue that we need to de­sign sys­tems with a failure rate be­low 10−63 per log­i­cal op­er­a­tion.

What needs to be em­pha­sized in the face of this re­quire­ment is that the very best phys­i­cal mea­sure­ments have only one part in 1017 pre­ci­sion, not to speak of so­cial and psy­cholog­i­cal phe­nom­ena where our un­der­stand­ing is con­sid­er­ably weaker. What this means is that guaran­tees of the req­ui­site sort can only be ex­pected from math­e­mat­ics, where our mea­sure­ment pre­ci­sion is already con­sid­er­ably bet­ter.

How re­li­able is math­e­mat­ics?

The pe­riod since World War II has brought in­cred­ible ad­vances in math­e­mat­ics, such as the Four Color The­o­rem (Ap­pel and Haken 1976), Fer­mat’s Last The­o­rem (Wiles 1995), the clas­sifi­ca­tion of finite sim­ple groups (Goren­stein 1982, Aschbacher 2004), and the Poin­care con­jec­ture (Perel­man 1994). While the com­mu­nity of math­e­mat­i­ci­ans is en­tirely con­vinced of the cor­rect­ness of these re­sults, few in­di­vi­d­ual math­e­mat­i­ci­ans are, as the com­plex­ity of the proofs, both in terms of knowl­edge as­sumed from var­i­ous branches of math­e­mat­ics and in terms of the length of the de­duc­tive chain, is gen­er­ally be­yond our ken. In­stead of a per­sonal un­der­stand­ing of the mat­ter, most of us now rely on ar­gu­men­tum ad vere­cun­diam: well Falt­ings and Ri­bet now think that the Wiles-Tay­lor proof is cor­rect, and even if I don’t know Falt­ings or Ri­bet at least I know and re­spect peo­ple who know and re­spect them, and if that’s not good enough I can go and de­vote a few years of my life to un­der­stand the proof for good. Un­for­tu­nately, the com­mu­nal check­ing of proofs of­ten takes years, and some­times er­rors are dis­cov­ered only af­ter a decade has passed: the hole in the origi­nal proof of the Four Color The­o­rem (Kempe 1879) was de­tected by Hea­wood in 1890. Tomon­aga in his No­bel lec­ture (1965) de­scribes how his team’s work in 1947 un­cov­ered a ma­jor prob­lem in Dan­coff (1939):

Our new method of calcu­la­tion was not at all differ­ent in its con­tents from Dan­coff’s per­tur­ba­tion method, but had the ad­van­tage of mak­ing the calcu­la­tion more clear. In fact, what took a few months in the Dan­coff type of calcu­la­tion could be done in a few weeks. And it was by this method that a mis­take was dis­cov­ered in Dan­coff’s calcu­la­tion; we had also made the same mis­take in the be­gin­ning.

To see that such long-hid­den er­rors are are by no means a thing of the past, and to ob­serve the ‘web of trust’ method in ac­tion, con­sider the fol­low­ing ex­am­ple from Mohr (2012).

The eighth-or­der co­effi­cient A1(8) arises from 891 Feyn­man di­a­grams of which only a few are known an­a­lyt­i­cally. Eval­u­a­tion of this co­effi­cient nu­mer­i­cally by Kinoshita and cowork­ers has been un­der­way for many years (Kinoshita, 2010). The value used in the 2006 ad­just­ment is A1(8) = −1.7283(35) as re­ported by Kinoshita and Nio (2006). How­ever, (...) it was dis­cov­ered by Aoyama et al. (2007) that a sig­nifi­cant er­ror had been made in the calcu­la­tion. In par­tic­u­lar, 2 of the 47 in­te­grals rep­re­sent­ing 518 di­a­grams that had not been confirmed in­de­pen­dently re­quired a cor­rected treat­ment of in­frared di­ver­gences. (...) The new value is (Aoyama et al., 2007) A1(8) = 1.9144(35); (111) de­tails of the calcu­la­tion are given by Aoyama et al. (2008). In view of the ex­ten­sive effort made by these work­ers to en­sure that the re­sult in Eq. (111) is re­li­able, the Task Group adopts both its value and quoted un­cer­tainty for use in the 2010 ad­just­ment.

As­sum­ing no more than three mil­lion math­e­mat­ics and physics pa­pers pub­lished since the be­gin­nings of sci­en­tific pub­lish­ing, and no less than the three er­rors doc­u­mented above, we can safely con­clude that the over­all er­ror rate of the rea­son­ing used in these fields is at least 10-6 per pa­per.

The role of au­to­mated the­o­rem-proving

That hu­man rea­son­ing, much like man­ual ar­ith­metic, is a sig­nifi­cantly er­ror-prone pro­cess comes as no sur­prise. Start­ing with de Bruijn’s Au­tomath (see Ned­er­pelt et al 1994) lo­gi­ci­ans and com­puter sci­en­tists have in­vested sig­nifi­cant effort in mech­a­nized proof check­ing, and it is in­deed only through such efforts, in par­tic­u­lar through the Coq ver­ifi­ca­tion (Gon­thier 2008) of the en­tire logic be­hind the Ap­pel and Haken proof that all lin­ger­ing doubts about the Four Color The­o­rem were laid to rest. The er­ror in A1(8) was also iden­tified by us­ing FORTRAN code gen­er­ated by an au­to­matic code gen­er­a­tor (Mohr et al 2012).

To gain an ap­pre­ci­a­tion of the state of the art, con­sider the the­o­rem that finite groups of odd or­der are solv­able (Feit and Thomp­son 1963). The proof, which took two hu­mans about two years to work out, takes up an en­tire is­sue of the Pacific Jour­nal of Math­e­mat­ics (255 pages), and it was only this year that a fully for­mal proof was com­pleted by Gon­thier’s team (see Knies 2012). The effort, 170,000 lines, 15,000 defi­ni­tions, 4,200 the­o­rems in Coq terms, took per­son-decades of hu­man as­sis­tance (15 peo­ple work­ing six years, though many of them part-time) even af­ter the toil of Ben­der and Glauber­man (1995) and Peter­falvi (2000), who have greatly cleaned up and mod­u­larized the origi­nal proof, in which el­e­men­tary group-the­o­retic and char­ac­ter-the­o­retic ar­gu­men­ta­tion was com­pletely in­ter­mixed.

The clas­sifi­ca­tion of sim­ple finite groups is two or­ders of mag­ni­tude big­ger: the effort in­volved about 100 hu­mans, the origi­nal proof is scat­tered among 20,000 pages of pa­pers, the largest (Aschbacher and Smith 2004a,b) tak­ing up two vol­umes to­tal­ing some 1,200 pages. While ev­ery­body ca­pa­ble of ren­der­ing mean­ingful judg­ment con­sid­ers the proof to be com­plete and cor­rect, it must be some­what wor­ri­some at the 10-64 level that there are no more than a cou­ple of hun­dred such peo­ple, and most of them have some­thing of a vested in­ter­est in that they them­selves con­tributed to the proof. Let us sup­pose that peo­ple who are con­vinced that the clas­sifi­ca­tion is bug-free are offered the fol­low­ing bet by some su­pe­rior in­tel­li­gence that knows the an­swer. You must en­ter a room with as many peo­ple you can con­vince to come with you and push a but­ton: if the clas­sifi­ca­tion is bug-free you will each re­ceive $100, if not, all of you will im­me­di­ately die. Per­haps fools rush in where an­gels fear to tread, but on the whole we still wouldn’t ex­pect too many tak­ers.

Whether the clas­sifi­ca­tion of finite sim­ple groups is com­plete and cor­rect is very hard to say – the planned sec­ond gen­er­a­tion proof will still be 5,000 pages, and mech­a­nized proof is not yet in sight. But this is not to say that gain­ing math­e­mat­i­cal knowl­edge of the re­quired de­gree of re­li­a­bil­ity is hope­less, it’s just that in­stead of mon­u­men­tal chains of ab­stract rea­son­ing we need to re­treat to con­sid­er­ably sim­pler ones. Take, for ex­am­ple, the first Sy­low The­o­rem, that if the or­der of a finite group G is di­visi­ble by some prime power pn, G will have a sub­group H of this or­der. We are ab­solutely cer­tain about this. Ar­gu­men­tum ad vere­cun­diam of course is still available, but it is not needed: any­body can join the hive-mind by study­ing the proof. The Coq ver­ifi­ca­tion con­tains 350 lines, 15 defi­ni­tions, 90 the­o­rems, and took 2 peo­ple 2 weeks to pro­duce. The num­ber of peo­ple ca­pa­ble of ren­der­ing mean­ingful judg­ment is at least three or­ders of mag­ni­tude larger, and the vast ma­jor­ity of those who know the proof would con­sider bet­ting their lives on the truth of this the­o­rem an easy way of win­ning $100 with no down­side risk.

Fur­ther remarks

Not only do we have to prove that the planned AGI will be friendly, the proof it­self has to be short enough to be ver­ifi­able by hu­mans. Con­sider, for ex­am­ple, the fun­da­men­tal the­o­rem of alge­bra. Could it be the case that we, hu­mans, are all de­luded into think­ing that an n-th de­gree polyno­mial will have n roots? Yes, but this is un­likely in the ex­treme. If this so-called the­o­rem is re­ally a trap laid by a su­pe­rior in­tel­li­gence we are doomed any­way, hu­man­ity can find its way around it no more than a bee can find its way around the win­dow­pane. Now con­sider the four-color the­o­rem, which is still out­side the hu­man-ver­ifi­able range. It is fair to say that it would be un­wise to cre­ate AIs whose friendli­ness crit­i­cally de­pends on de­sign limits im­plied by the truth of this the­o­rem, while AIs whose friendli­ness is guaran­teed by the fun­da­men­tal the­o­rem of alge­bra rep­re­sent a tol­er­able level of risk.

Re­cently, Go­ertzel and Pitt (2012) have laid out a plan to en­dow AGI with moral­ity by means of care­fully con­trol­led ma­chine learn­ing. Much as we are in agree­ment with their goals, we re­main skep­ti­cal about their plan meet­ing the plain failure en­g­ineer­ing crite­ria laid out above.