Autopoietic systems and difficulty of AGI alignment

I have re­cently come to the opinion that AGI al­ign­ment is prob­a­bly ex­tremely hard. But it’s not clear ex­actly what AGI or AGI al­ign­ment are. And there are some forms of al­ig­ment of “AI” sys­tems that are easy. Here I op­er­a­tional­ize “AGI” and “AGI al­ign­ment” in some differ­ent ways and eval­u­ate their difficul­ties.


Au­topoietic cog­ni­tive systems

From Wikipe­dia:

The term “au­topoie­sis” refers to a sys­tem ca­pa­ble of re­pro­duc­ing and main­tain­ing it­self.

This isn’t en­tirely tech­ni­cally crisp. I’ll elab­o­rate on my us­age of the term:

  • An au­topoietic sys­tem ex­pands, per­haps in­definitely. It will feed on other re­sources and through its ac­tivity gain the abil­ity to feed on more things. It can gen­er­ate com­plex­ity that was not pre­sent in the origi­nal sys­tem through e.g. mu­ta­tion and se­lec­tion. In some sense, an au­topoietic sys­tem is like an in­de­pen­dent self-sus­tain­ing econ­omy.

  • An au­topoietic sys­tem, in prin­ci­ple, doesn’t need an ex­ter­nal source of au­topoe­sis. It can main­tain it­self and ex­pand re­gard­less of whether the world con­tains other au­topoietic sys­tems.

  • An au­topoietic cog­ni­tive sys­tem con­tains in­tel­li­gent think­ing.

Some ex­am­ples:

  • A group of peo­ple on an is­land that can sur­vive for a long time and de­velop tech­nol­ogy is an au­topoietic cog­ni­tive sys­tem.

  • Evolu­tion is an au­topoietic cog­ni­tive sys­tem (cog­ni­tive be­cause it con­tains an­i­mals).

  • An econ­omy made of robots that can re­pair them­selves, cre­ate new robots, gather re­sources, de­velop new tech­nol­ogy, etc is an au­topoietic cog­ni­tive sys­tem.

  • A moon base that nec­es­sar­ily de­pends on Earth for re­sources is not au­topoietic.

  • A car is not au­topoietic.

  • A com­puter with limited mem­ory not con­nected to the ex­ter­nal world can’t be au­topoietic.

Fully au­to­mated au­topoietic cog­ni­tive systems

A fully au­to­mated au­topoietic cog­ni­tive sys­tem is an au­topoietic cog­ni­tive sys­tem that be­gan from a par­tic­u­lar com­puter pro­gram run­ning on a com­put­ing sub­strate such as a bunch of sili­con com­put­ers. It may re­quire hu­mans as ac­tu­a­tors, but doesn’t need hu­mans for cog­ni­tive work, and could in prin­ci­ple use robots as ac­tu­a­tors.

Some might use the term “re­cur­sively self-im­prov­ing AGI” to mean some­thing similar to “fully au­to­mated au­topoietic cog­ni­tive sys­tem”.

The con­cept seems pretty similar to “strong AI”, though not iden­ti­cal.

Difficulty of al­ign­ing a fully au­to­mated au­topoietic cog­ni­tive system

Creat­ing a good and ex­tremely-use­ful fully au­to­mated au­topoietic cog­ni­tive sys­tem re­quires solv­ing ex­tremely difficult philo­soph­i­cal and math­e­mat­i­cal prob­lems. In some sense, it re­quires an­swer­ing the ques­tion of “what is good” with a par­tic­u­lar com­puter pro­gram. The sys­tem can’t rely on hu­mans for its cog­ni­tive work, so in an im­por­tant sense it has to figure out the world and what is good by it­self. This re­quires “wrap­ping up” large parts of philos­o­phy.

For some in­tu­itions about this, it might help to imag­ine a par­tic­u­lar au­topoietic sys­tem: an alien civ­i­liza­tion. Imag­ine an ar­tifi­cial planet run­ning evolu­tion at an ex­tremely fast speed, even­tu­ally pro­duc­ing in­tel­li­gent aliens that form a civ­i­liza­tion. The re­sult of this pro­cess would be ex­tremely un­pre­dictable, and there is not much rea­son to think it would be par­tic­u­larly good to hu­mans (other than the de­ci­sion-the­o­retic ar­gu­ment of “per­haps smart agents co­op­er­ate with less-smart agents that spawned them be­cause they want this co­op­er­a­tion to hap­pen in gen­eral”, which is poorly un­der­stood and only some­what de­ci­sion-rele­vant).

Al­most-fully-au­to­mated au­topoietic cog­ni­tive systems

An al­most-fully-au­to­mated au­topoietic cog­ni­tive sys­tem is an au­topoietic cog­ni­tive sys­tem that re­ceives some in­put from hu­mans, but a quite-limited amount (say, less than 1,000,000 to­tal hours from hu­mans). After re­ceiv­ing this much data, it is au­topoietic in the sense that it doesn’t re­quire hu­mans for do­ing its cog­ni­tive work. It does a very large amount of ex­pan­sion and cog­ni­tion af­ter re­ceiv­ing this data.

Some ex­am­ples:

  • Any “raise the AGI like you would raise a child” pro­posal falls in this cat­e­gory.

  • An AGI that thinks on its own but some­times gives queries to hu­mans would fall in this cat­e­gory.

  • ALBA doesn’t use the on­tol­ogy of “au­topoietic sys­tems”, but if Paul Chris­ti­ano’s re­search agenda suc­ceeded, it would even­tu­ally pro­duce an al­igned al­most-fully-au­to­mated au­topoietic cog­ni­tive sys­tem (in or­der to be com­pet­i­tive with an un­al­igned al­most-fully-au­to­mated au­topoietic cog­ni­tive sys­tem)

Difficulty of al­ign­ing an al­most-fully-au­to­mated au­topoietic cog­ni­tive system

My sense is that cre­at­ing a good and ex­tremely-use­ful al­most-fully-au­to­mated au­topoietic cog­ni­tive sys­tem also re­quires solv­ing ex­tremely difficult philo­soph­i­cal and math­e­mat­i­cal prob­lems. Although get­ting data from hu­mans will help in guid­ing the sys­tem, there is only a limited amount of guidance available (the sys­tem does a bunch of cog­ni­tive work on its own). One can imag­ine an ar­tifi­cial planet run­ning at an ex­tremely fast speed that oc­ca­sion­ally pauses to ask you a ques­tion. This does not re­quire “wrap­ping up” large parts of philos­o­phy im­me­di­ately, but it does re­quire “wrap­ping up” large parts of philos­o­phy in the course of the ex­e­cu­tion of the sys­tem.

(Of course ar­tifi­cal planets run­ning evolu­tion aren’t the only au­topoietic cog­ni­tive sys­tems, but it seems use­ful to imag­ine life-based au­topoietic cog­ni­tive sys­tems in the ab­sence of a clear al­ter­na­tive)

Like with un­al­igned fully au­to­mated au­topoietic cog­ni­tive sys­tems, un­al­igned al­most-fully-au­to­mated au­topoietic cog­ni­tive sys­tems would be ex­tremely dan­ger­ous to hu­man­ity: the fu­ture of the uni­verse would be out­side of hu­man­ity’s hands.

My im­pres­sion is that the main “MIRI plan” is to cre­ate an al­most-fully-au­to­mated au­topoietic cog­ni­tive sys­tem that ex­pands to a high level, stops, and then as­sists hu­mans in ac­com­plish­ing some task. (See: ex­e­cutable philos­o­phy; task-di­rected AGI).

Non-au­topoietic cog­ni­tive sys­tems that ex­tend hu­man autopoiesis

An im­por­tant cat­e­gory of cog­ni­tive sys­tems are ones that ex­tend hu­man au­topoie­sis with­out be­ing au­topoietic them­selves. The In­ter­net is one ex­am­ple of such a sys­tem: it can’t pro­duce or main­tain it­self, but it ex­tends hu­man ac­tivity and au­to­mates parts of it.

This is similar to but more ex­pan­sive than the con­cept of “nar­row AI”, since they in prin­ci­ple they could be do­main-gen­eral (e.g. a neu­ral net policy trained to gen­er­al­ize across differ­ent types of tasks). The con­cept of “weak AI” is similar.

Non-au­topoietic au­to­mated cog­ni­tive sys­tems can pre­sent ex­is­ten­tial risks, for the same rea­son other tech­nolo­gies and so­cial or­ga­ni­za­tions (nu­clear weapons, surveillance tech­nol­ogy, global dic­ta­tor­ship) pre­sent ex­is­ten­tial risk. But in an im­por­tant sense, non-au­topoietic cog­ni­tive sys­tems are “just an­other tech­nol­ogy” con­tigu­ous with other au­toma­tion tech­nol­ogy, and man­ag­ing them doesn’t re­quire do­ing any­thing like wrap­ping up large parts of philos­o­phy.

Where does Paul’s agenda fit in?

[edit: see this com­ment thread]

As far as I can tell, Paul’s pro­posal is to cre­ate an al­most-fully-au­to­mated au­topoietic sys­tem that is “seeded with” hu­man au­topoie­sis in such a way that, though af­ter­wards it grows with­out hu­man over­sight, it even­tu­ally does things that hu­mans would find to be good. In an im­por­tant sense, it ex­tends hu­man au­topoie­sis, though with­out many hu­mans in the sys­tem to en­sure sta­bil­ity over time. It avoids value drift over time through some “basin of at­trac­tion” as in Paul’s post on cor­rigi­bil­ity. (Paul can cor­rect me if I got any of this wrong)

In this com­ment, Paul says he is not con­vinced that lack of philo­soph­i­cal un­der­stand­ing is a main driver of risk, with the im­pli­ca­tion that hu­mans can per­haps cre­ate al­igned AI sys­tems with­out un­der­stand­ing philos­o­phy; this makes sense to the ex­tent that AI sys­tems are ex­tend­ing hu­man au­topoie­sis and avoid­ing value drift rather than hav­ing their own origi­nal au­topoie­sis.

I wrote up some thoughts on Paul Chris­ti­ano’s agenda already. Roughly, my take is that is that get­ting cor­rigi­bil­ity right (i.e. get­ting an au­topoietic sys­tem to ex­tend hu­man au­topoie­sis with­out much hu­man over­sight and with­out hav­ing value drift) re­quires solv­ing very difficult philo­soph­i­cal prob­lems, and it’s not clear whether these are eas­ier or harder than those re­quired for the “MIRI plan” of cre­at­ing an al­most-fully-au­to­mated au­topoietic cog­ni­tive sys­tem that does not ex­tend hu­man au­to­pi­oe­sis but does as­sist hu­mans in some task. Of course, I don’t have all of Paul’s in­tu­itions on how to do cor­rigi­bil­ity.

I would agree with Paul that, con­di­tioned on the AGI al­ign­ment prob­lem not be­ing very hard, it’s prob­a­bly be­cause of cor­rigi­bil­ity.

My position

I would sum­ma­rize my po­si­tion on AGI al­ign­ment as:

  • Align­ing a fully au­to­mated au­topoietic cog­ni­tive sys­tem, or an al­most-fully-au­to­mated au­topoietic cog­ni­tive sys­tem, both seem ex­tremely difficult. My snap judg­ment is to as­sign about 1% prob­a­bil­ity to hu­man­ity solv­ing this prob­lem in the next 20 years. (My im­pres­sion is that “the MIRI po­si­tion” thinks the prob­a­bil­ity of this work­ing is pretty low, too, but doesn’t see a a good al­ter­na­tive)

  • Con­sis­tent with this ex­pec­ta­tion, I hope that hu­mans do not de­velop al­most-fully-au­to­mated au­topoietic cog­ni­tive sys­tems in the near term. I hope that they in­stead con­tinue to de­velop and use non-au­topoietic cog­ni­tive sys­tems that ex­tend hu­man au­topoie­sis. I also hope that, if nec­es­sary, hu­mans can co­or­di­nate to pre­vent the cre­ation of un­al­igned fully-au­to­mated or al­most-fully-au­to­mated au­topoietic cog­ni­tive sys­tems, pos­si­bly us­ing non-au­topoietic cog­ni­tive sys­tems to help them co­or­di­nate.

  • I ex­pect that think­ing about how to al­ign al­most-fully-au­to­mated au­topoietic cog­ni­tive sys­tems with hu­man val­ues has some di­rect use­ful­ness and some in­di­rect use­ful­ness (for in­creas­ing some forms of philo­soph­i­cal/​math­e­mat­i­cal com­pe­tence), though ac­tu­ally solv­ing the prob­lem is very difficult.

  • I ex­pect that non-au­topoietic cog­ni­tive sys­tems will con­tinue to get bet­ter over time, and that their use will sub­stan­ta­lly change so­ciety in im­por­tant ways.