Neural program synthesis is a dangerous technology

Cross­posted from my Medium blog

To­day’s com­puter viruses can copy them­selves be­tween com­put­ers, but they can’t mu­tate. That might change very soon, lead­ing to a se­ri­ous crisis.

Re­search on pro­gram syn­the­sis is pro­ceed­ing quickly, us­ing neu­ral re­in­force­ment learn­ing. With a similar ap­proach, re­searchers in 2017 showed that a com­puter can teach it­self to mas­ter chess and go, en­tirely from scratch.

The pro­grams cur­rently be­ing gen­er­ated are very small: di­vid­ing two num­bers, re­vers­ing a string of let­ters, etc. It could be that there’s an­other chasm to cross be­fore pro­grams for more in­ter­est­ing tasks can be gen­er­ated. But what if there’s not?

Con­sider the prob­lem of au­to­mat­i­cally gen­er­at­ing a pro­gram that can in­stall it­self on an­other com­puter on the net­work. To make this work, the statis­ti­cal pro­cess gen­er­at­ing the pro­gram needs some way to know which par­tial pro­grams are most promis­ing. Me­taphor­i­cally, the “AI” needs some way to “know” when it’s “on the right track”.

Imag­ine you were the one gen­er­at­ing this pro­gram, and you knew noth­ing ex­cept that it should be a se­quence of char­ac­ters. Even know­ing noth­ing, if you could get feed­back on each char­ac­ter, you’d quickly be able to get the re­sult through sheer trial and er­ror. You would just have to try each char­ac­ter one-by-one, keep­ing the char­ac­ter that performed best. If you only got feed­back on each char­ac­ter pair, your life would be harder, but you’d still get there even­tu­ally. If feed­back only came af­ter each com­plete line, you prob­a­bly wouldn’t get to the an­swer in a hu­man life­time.

The sim­ple point here is that the task of gen­er­at­ing malware au­to­mat­i­cally will be difficult if the “re­ward” is all-or-noth­ing — if we only know we’re on the right track once the pro­gram is fully suc­cess­ful. If par­tial suc­cess can be re­warded, it will be much eas­ier to find the cor­rect solu­tion.

It might be that there are lots of ob­vi­ous, in­ter­me­di­ate re­wards that make gen­er­at­ing malware very easy. Exit with­out er­ror, con­nec­tion to the tar­get IP, long and varied com­mu­ni­ca­tion all seem like good signs that the pro­gram is on the right track.

If it turns out that writ­ing a virus is much eas­ier than mas­ter­ing chess, we could be in a lot of trou­ble. Every time the virus in­fects an­other com­puter, the new host can be put to work im­prov­ing and spread­ing the malware. The im­prove­ments ap­ply to the pro­cess of gen­er­at­ing the malware, rather than any sin­gle ex­ploit. Once the at­tack gains mo­men­tum, it could be find­ing and ex­ploit­ing new vuln­er­a­bil­ities faster than they can be patched. Vuln­er­a­bil­ities could be un­cov­ered and ex­ploited at any level, in­clud­ing low-level firmware.

Peo­ple love get­ting into philo­soph­i­cal de­bates about the hard prob­lem of con­scious­ness, and whether “real” ar­tifi­cial in­tel­li­gence is pos­si­ble or likely. De­bat­ing these things is as fine a hobby as any, but we shouldn’t lose sight of the fact that mu­tat­ing self-repli­cat­ing sys­tems don’t need any­thing like “gen­eral in­tel­li­gence” to pose a huge threat.

For the first big AI safety crisis, think robot flu, not robot up­ris­ing.

No nominations.