Superintelligence via whole brain emulation

Most plan­ning around AI risk seems to start from the premise that su­per­in­tel­li­gence will come from de novo AGI be­fore whole brain em­u­la­tion be­comes pos­si­ble. I haven’t seen any anal­y­sis that as­sumes both up­loads-first and the AI FOOM the­sis (Edit: ap­par­ently I fail at liter­a­ture search­ing), a defi­ciency that I’ll try to get a start on cor­rect­ing in this post.

It is likely pos­si­ble to use evolu­tion­ary al­gorithms to effi­ciently mod­ify up­loaded brains. If so, up­loads would likely be able to set off an in­tel­li­gence ex­plo­sion by run­ning evolu­tion­ary al­gorithms on them­selves, se­lect­ing for some­thing like higher gen­eral in­tel­li­gence.

Since brains are poorly un­der­stood, it would likely be very difficult to se­lect for higher in­tel­li­gence with­out caus­ing sig­nifi­cant value drift. Thus, set­ting off an in­tel­li­gence ex­plo­sion in that way would prob­a­bly pro­duce un­friendly AI if done care­lessly. On the other hand, at some point, the mod­ified up­load would reach a point where it is ca­pa­ble of figur­ing out how to im­prove it­self with­out caus­ing a sig­nifi­cant amount of fur­ther value drift, and it may be pos­si­ble to reach that point be­fore too much value drift had already taken place. The ex­pected amount of value drift can be de­creased by hav­ing long gen­er­a­tions be­tween iter­a­tions of the evolu­tion­ary al­gorithm, to give the im­proved brains more time to figure out how to mod­ify the evolu­tion­ary al­gorithm to min­i­mize fur­ther value drift.

Another pos­si­bil­ity is that such an evolu­tion­ary al­gorithm could be used to cre­ate brains that are smarter than hu­mans but not by very much, and hope­fully with val­ues not too di­ver­gent from ours, who would then stop us­ing the evolu­tion­ary al­gorithm and start us­ing their in­tel­lects to re­search de novo Friendly AI, if that ends up look­ing eas­ier than con­tin­u­ing to run the evolu­tion­ary al­gorithm with­out too much fur­ther value drift.

The strate­gies of us­ing slow iter­a­tions of the evolu­tion­ary al­gorithm, or stop­ping it af­ter not too long, re­quire co­or­di­na­tion among ev­ery­one ca­pa­ble of mak­ing such mod­ifi­ca­tions to up­loads. Thus, it seems safer for whole brain em­u­la­tion tech­nol­ogy to be ei­ther heav­ily reg­u­lated or owned by a monopoly, rather than be­ing widely available and un­reg­u­lated. This closely par­allels the AI open­ness de­bate, and I’d ex­pect peo­ple more con­cerned with bad ac­tors rel­a­tive to ac­ci­dents to dis­agree.

With de novo ar­tifi­cial su­per­in­tel­li­gence, the over­whelm­ingly most likely out­comes are the op­ti­mal achiev­able out­come (if we man­age to al­ign its goals with ours) and ex­tinc­tion (if we don’t). But up­loads start out with hu­man val­ues, and when cre­at­ing a su­per­in­tel­li­gence by mod­ify­ing up­loads, the goal would be to not cor­rupt them too much in the pro­cess. Since its val­ues could get par­tially cor­rupted, an in­tel­li­gence ex­plo­sion that starts with an up­load seems much more likely to re­sult in out­comes that are both sig­nifi­cantly worse than op­ti­mal and sig­nifi­cantly bet­ter than ex­tinc­tion. Since hu­man brains also already have a ca­pac­ity for mal­ice, this pro­cess also seems slightly more likely to re­sult in out­comes worse than ex­tinc­tion.

The early ways to up­load brains will prob­a­bly be de­struc­tive, and may be very risky. Thus the first up­loads may be se­lected for high risk-tol­er­ance. Run­ning an evolu­tion­ary al­gorithm on an up­loaded brain would prob­a­bly in­volve cre­at­ing a large num­ber of psy­cholog­i­cally bro­ken copies, since the av­er­age change to a brain will be nega­tive. Thus the up­loads that run evolu­tion­ary al­gorithms on them­selves will be se­lected for not be­ing hor­rified by this. Both of these se­lec­tion effects seem like they would se­lect against peo­ple who would take cau­tion and goal sta­bil­ity se­ri­ously (up­loads that run evolu­tion­ary al­gorithms on them­selves would also be se­lected for be­ing okay with cre­at­ing and delet­ing spur copies, but this doesn’t ob­vi­ously cor­re­late in ei­ther di­rec­tion with cau­tion). This could be par­tially miti­gated by a monopoly on brain em­u­la­tion tech­nol­ogy. A pos­si­ble (but prob­a­bly smaller) source of pos­i­tive se­lec­tion is that cur­rently, peo­ple who are en­thu­si­as­tic about up­load­ing their brains cor­re­late strongly with peo­ple who are con­cerned about AI safety, and this cor­re­la­tion may con­tinue once whole brain em­u­la­tion tech­nol­ogy is ac­tu­ally available.

As­sum­ing that hard­ware speed is not close to be­ing a limit­ing fac­tor for whole brain em­u­la­tion, em­u­la­tions will be able to run at much faster than hu­man speed. This should make em­u­la­tions bet­ter able to mon­i­tor the be­hav­ior of AIs. Un­less we de­velop ways of eval­u­at­ing the ca­pa­bil­ities of hu­man brains that are much faster than giv­ing them time to at­tempt difficult tasks, run­ning evolu­tion­ary al­gorithms on brain em­u­la­tions could only be done very slowly in sub­jec­tive time (even though it may be quite fast in ob­jec­tive time), which would give em­u­la­tions a sig­nifi­cant ad­van­tage in mon­i­tor­ing such a pro­cess.

Although there are effects go­ing in both di­rec­tions, it seems like the up­loads-first sce­nario is prob­a­bly safer than de novo AI. If this is the case, then it might make sense to ac­cel­er­ate tech­nolo­gies that are needed for whole brain em­u­la­tion if there are tractable ways of do­ing so. On the other hand, it is pos­si­ble that tech­nolo­gies that are use­ful for whole brain em­u­la­tion would also be use­ful for neu­ro­mor­phic AI, which is prob­a­bly very un­safe, since it is not amenable to for­mal ver­ifi­ca­tion or be­ing given ex­plicit goals (and un­like em­u­la­tions, they don’t start off already hav­ing hu­man goals). Thus, it is prob­a­bly im­por­tant to be care­ful about not ac­cel­er­at­ing non-WBE neu­ro­mor­phic AI while at­tempt­ing to ac­cel­er­ate whole brain em­u­la­tion. For in­stance, it seems plau­si­ble to me that get­ting bet­ter mod­els of neu­rons would be use­ful for cre­at­ing neu­ro­mor­phic AIs while bet­ter brain scan­ning would not, and both tech­nolo­gies are nec­es­sary for brain up­load­ing, so if that is true, it may make sense to work on im­prov­ing brain scan­ning but not on im­prov­ing neu­ral mod­els.