AI Safety “Success Stories”

AI safety re­searchers of­ten de­scribe their long term goals as build­ing “safe and effi­cient AIs”, but don’t always mean the same thing by this or other seem­ingly similar phrases. Ask­ing about their “suc­cess sto­ries” (i.e., sce­nar­ios in which their line of re­search helps con­tribute to a pos­i­tive out­come) can help make clear what their ac­tual re­search aims are. Know­ing such sce­nar­ios also makes it eas­ier to com­pare the am­bi­tion, difficulty, and other at­tributes of differ­ent lines of AI safety re­search. I hope this con­tributes to im­proved com­mu­ni­ca­tion and co­or­di­na­tion be­tween differ­ent groups of peo­ple work­ing on AI risk.

In the rest of the post, I de­scribe some com­mon AI safety suc­cess sto­ries that I’ve heard over the years and then com­pare them along a num­ber of di­men­sions. They are listed in roughly the or­der in which they first came to my at­ten­tion. (Sugges­tions wel­come for bet­ter names for any of these sce­nar­ios, as well as ad­di­tional suc­cess sto­ries and ad­di­tional di­men­sions along which they can be com­pared.)

The Suc­cess Stories

Sovereign Singleton

AKA Friendly AI, an au­tonomous, su­per­hu­manly in­tel­li­gent AGI that takes over the world and op­ti­mizes it ac­cord­ing to some (per­haps in­di­rect) speci­fi­ca­tion of hu­man val­ues.

Pivotal Tool

An or­a­cle or task AGI, which can be used to perform a pivotal but limited act, and then stops to wait for fur­ther in­struc­tions.

Cor­rigible Contender

A semi-au­tonomous AGI that does not have long-term prefer­ences of its own but acts ac­cord­ing to (its un­der­stand­ing of) the short-term prefer­ences of some hu­man or group of hu­mans, it com­petes effec­tively with com­pa­rable AGIs cor­rigible to other users as well as un­al­igned AGIs (if any ex­ist), for re­sources and ul­ti­mately for in­fluence on the fu­ture of the uni­verse.

In­terim Qual­ity-of-Life Improver

AI risk can be min­i­mized if world pow­ers co­or­di­nate to limit AI ca­pa­bil­ities de­vel­op­ment or de­ploy­ment, in or­der to give AI safety re­searchers more time to figure out how to build a very safe and highly ca­pa­ble AGI. While that is pro­ceed­ing, it may be a good idea (e.g., poli­ti­cally ad­vis­able and/​or morally cor­rect) to de­ploy rel­a­tively safe, limited AIs that can im­prove peo­ple’s qual­ity of life but are not nec­es­sar­ily state of the art in terms of ca­pa­bil­ity or effi­ciency. Such im­prove­ments can for ex­am­ple in­clude cur­ing dis­eases and solv­ing press­ing sci­en­tific and tech­nolog­i­cal prob­lems.

(I want to credit Ro­hin Shah as the per­son that I got this suc­cess story from, but can’t find the post or com­ment where he talked about it. Was it some­one else?)

Re­search Assistant

If an AGI pro­ject gains a lead over its com­peti­tors, it may be able to grow that into a larger lead by build­ing AIs to help with (ei­ther safety or ca­pa­bil­ity) re­search. This can be in the form of an or­a­cle, or hu­man imi­ta­tion, or even nar­row AIs use­ful for mak­ing money (which can be used to buy more com­pute, hire more hu­man re­searchers, etc). Such Re­search As­sis­tant AIs can help pave the way to one of the other, more defini­tive suc­cess sto­ries. Ex­am­ples: 1, 2.

Com­par­i­son Table

Sovereign Sin­gle­ton Pivotal Tool Cor­rigible Con­tender In­terim Qual­ity-of-Life Im­prover Re­search As­sis­tant
Au­ton­omy High Low Medium Low Low
AI safety am­bi­tion /​ difficulty Very High Medium High Low Low
Reli­ance on hu­man safety Low High High Medium Medium
Re­quired ca­pa­bil­ity ad­van­tage over com­pet­ing agents High High None None Low
Tol­er­ates ca­pa­bil­ity trade-off due to safety mea­sures Yes Yes No Yes Some
As­sumes strong global co­or­di­na­tion No No No Yes No
Con­trol­led ac­cess Yes Yes No Yes Yes

(Note that due to limited space, I’ve left out a cou­ple of sce­nar­ios which are straight­for­ward re­com­bi­na­tions of the above suc­cess sto­ries, namely Sovereign Con­tender and Cor­rigible Sin­gle­ton. I also left out CAIS be­cause I find it hard to vi­su­al­ize it clearly enough as a suc­cess story to fill out its en­tries in the above table, plus I’m not sure if any safety re­searchers are cur­rently aiming for it as a suc­cess story.)

The color cod­ing in the table in­di­cates how hard it would be to achieve the re­quired con­di­tion for a suc­cess story to come to pass, with green mean­ing rel­a­tively easy, and yel­low/​pink/​vi­o­let in­di­cat­ing in­creas­ing difficulty. Below is an ex­pla­na­tion of what each row head­ing means, in case it’s not im­me­di­ately clear.


The op­po­site of hu­man-in-the-loop.

AI safety am­bi­tion/​difficulty

Achiev­ing each suc­cess story re­quires solv­ing a differ­ent set of AI safety prob­lems. This is my sub­jec­tive es­ti­mate of how am­bi­tious/​difficult the cor­re­spond­ing set of AI safety prob­lems is. (Please feel free to dis­agree in the com­ments!)

Reli­ance on hu­man safety

How much does achiev­ing this suc­cess story de­pend on hu­mans be­ing safe, or on solv­ing hu­man safety prob­lems? This is also a sub­jec­tive judge­ment be­cause differ­ent suc­cess sto­ries rely on differ­ent as­pects of hu­man safety.

Re­quired ca­pa­bil­ity ad­van­tage over com­pet­ing agents

Does achiev­ing this suc­cess story re­quire that the safe/​al­igned AI have a ca­pa­bil­ity ad­van­tage over other agents in the world?

Tol­er­ates ca­pa­bil­ity trade-off due to safety measures

Many ways of achiev­ing AI safety have a cost in terms of low­er­ing the ca­pa­bil­ity of an AI rel­a­tive to an un­al­igned AI built us­ing com­pa­rable re­sources and tech­nol­ogy. In some sce­nar­ios this is not as con­se­quen­tial (e.g., be­cause it de­pends on achiev­ing a large ini­tial ca­pa­bil­ity lead and then pre­vent­ing any sub­se­quent com­peti­tors from aris­ing), and that’s in­di­cated by a “Yes” in this row.

As­sumes strong global coordination

Does this suc­cess story as­sume that there is strong global co­or­di­na­tion to pre­vent un­al­igned com­peti­tors from aris­ing?

Con­trol­led access

Does this suc­cess story as­sume that only a small num­ber of peo­ple are given ac­cess to the safe/​al­igned AI?

Fur­ther Thoughts

  1. This ex­er­cise made me re­al­ize that I’m con­fused about how the Pivotal Tool sce­nario is sup­posed to work, af­ter the ini­tial pivotal act is done. It would likely re­quire sev­eral years or decades to fully solve AI safety/​al­ign­ment and re­move the de­pen­dence on hu­man safety, but it’s not clear how to cre­ate a safe en­vi­ron­ment for do­ing that af­ter the pivotal act.

  2. One thing I’m less con­fused about now is why peo­ple who work to­ward the Con­tender sce­nar­ios are fo­cused more on min­i­miz­ing the ca­pa­bil­ity trade-off of safety mea­sures than peo­ple who work to­ward the Sin­gle­ton sce­nar­ios even though the lat­ter sce­nar­ios seem to de­mand more of a ca­pa­bil­ity lead. It’s be­cause the lat­ter group of peo­ple think it’s pos­si­ble or likely for a sin­gle AGI pro­ject to achieve a large ini­tial ca­pa­bil­ity ad­van­tage, in which case some ini­tial ca­pa­bil­ity trade-off due to safety mea­sures is ok, and sub­se­quent on­go­ing ca­pa­bil­ity trade-off is not con­se­quen­tial be­cause there would be no com­peti­tors left.

  3. The com­par­i­son table makes Re­search As­sis­tant seem a par­tic­u­larly at­trac­tive sce­nario to aim for, as a step­ping stone to a more defini­tive suc­cess story. Is this con­clu­sion ac­tu­ally jus­tified?

  4. In­terim Qual­ity-of-Life Im­prover also looks very at­trac­tive, if only strong global co­or­di­na­tion could be achieved.