(notes on) Policy Desiderata for Superintelligent AI: A Vector Field Approach

Meta: I thought I’d spend a lit­tle time read­ing the policy pa­pers that Nick Bostrom has writ­ten. I made notes as I went along, so I spent a lit­tle while clean­ing them up into a sum­mary post. Th­ese are my notes on Bostrom, Dafoe and Flynn’s 2016 policy desider­ata pa­per, which re­ceived sig­nifi­cant ed­its in 2018. I spent 6-8 hours on this post, not a great deal of time, so I’ve not been max­i­mally care­ful.

Con­text and Goals

Over­all, this is not a policy pro­posal. Nor does it com­mit strongly to a par­tic­u­lar moral or poli­ti­cal wor­ld­view. The goal of this pa­per is to merely ob­serve which policy challenges are es­pe­cially im­por­tant or differ­ent in the case of su­per­in­tel­li­gent AI, that most moral and poli­ti­cal wor­ld­views will need to deal with. The pa­per also makes no pos­i­tive ar­gu­ment for the im­por­tance or like­li­hood or timeline of su­per­in­tel­li­gent AI—it in­stead as­sumes that this shall oc­cur in the pre­sent cen­tury, and then ex­plores the policy challenges that would fol­low.

The Vec­tor Field Approach

Botrom, Dafoe and Flynn spend a fair amount of time ex­plain­ing that they’re not go­ing to be en­gag­ing in what (I think) Robin Han­son would call stan­dard value talk. They’re not go­ing to en­dorse a par­tic­u­lar moral or poli­ti­cal the­ory, nor are they go­ing to adopt var­i­ous moral or poli­ti­cal the­o­ries and show how they pro­pose differ­ent poli­cies. They’re go­ing to look at the de­tails of this par­tic­u­lar policy land­scape and try to talk about the reg­u­lar­i­ties that will need to be ad­dressed by most stan­dard moral and poli­ti­cal frame­works, and in what di­rec­tion these reg­u­lar­i­ties sug­gest chang­ing policy.

They call this the ‘vec­tor field’ ap­proach. If you don’t feel like you fully grok the con­cept, here’s the quote where they lay out the for­mal­ism (with light edit­ing for read­abil­ity).

The vec­tor field ap­proach might then at­tempt to de­rive di­rec­tional policy change con­clu­sions of a form that we might schemat­i­cally rep­re­sent as fol­lows:
“How­ever much em­pha­sis you think that states ought, un­der pre­sent cir­cum­stances, to give to the ob­jec­tive of eco­nomic equal­ity, there are cer­tain spe­cial cir­cum­stances , which can be ex­pected to hold in the rad­i­cal AI con­text we de­scribed above, that should make you think that in those cir­cum­stances states should in­stead give em­pha­sis to the ob­jec­tive of eco­nomic equal­ity.”
The idea is that here is some rel­a­tively sim­ple func­tion, defined over a space of pos­si­ble eval­u­a­tive stan­dards or ide­olog­i­cal po­si­tions. For in­stance, might sim­ply add a term to , which would cor­re­spond to the claim the em­pha­sis given eco­nomic equal­ity should be in­creased by a cer­tain amount in the cir­cum­stances (ac­cord­ing to all the ide­olog­i­cal po­si­tions un­der con­sid­er­a­tion).
Or might re­quire tel­ling a more com­pli­cated story, per­haps along the lines of:
“How­ever much em­pha­sis you give to eco­nomic equal­ity as a policy ob­jec­tive un­der pre­sent cir­cum­stances, un­der con­di­tions Y you should want to con­ceive of eco­nomic equal­ity differ­ently—cer­tain di­men­sions of eco­nomic in­equal­ity are likely to be­come ir­rele­vant and other di­men­sions are likely to be­come more im­por­tant or policy-rele­vant than they are to­day.”

I par­tic­u­larly like this quote:

This vec­tor field ap­proach is only fruit­ful to the ex­tent that there are some pat­terns in how the spe­cial cir­cum­stances im­pact policy as­sess­ments from differ­ent eval­u­a­tive po­si­tions. If the prospect of rad­i­cal AI had en­tirely differ­ent and idiosyn­cratic im­pli­ca­tions for ev­ery par­tic­u­lar ide­ol­ogy or in­ter­est plat­form, then the func­tion would amount to noth­ing more than a lookup table.

I read this as say­ing some­thing like “This pa­per only makes sense if facts mat­ter, sep­a­rate to val­ues.” It’s funny to me that this sen­tence felt nec­es­sary to be writ­ten.


A few more quotes on what the pa­per is try­ing to do.

A strong pro­posal for the gov­er­nance of ad­vanced AI would ideally ac­com­mo­date each of these desider­ata to a high de­gree. There may ex­ist ad­di­tional desider­ata that we have not iden­ti­fied here; we make no claim that our list is com­plete. Fur­ther­more, a strong policy pro­posal should pre­sum­ably also in­te­grate many other nor­ma­tive, pru­den­tial, and prac­ti­cal con­sid­er­a­tions that are ei­ther idiosyn­cratic to par­tic­u­lar eval­u­a­tive po­si­tions or are not dis­tinc­tive to the con­text of rad­i­cal AI.
Us­ing a “vec­tor field” ap­proach to nor­ma­tive anal­y­sis, we sought to ex­tract di­rec­tional policy im­pli­ca­tions from these spe­cial cir­cum­stances. We char­ac­ter­ized these im­pli­ca­tions as a set of desider­ata—traits of fu­ture poli­cies, gov­er­nance struc­tures, or de­ci­sion-mak­ing con­texts that would, by the stan­dards of a wide range of key ac­tors, stake­hold­ers, and eth­i­cal views, en­hance the prospects of benefi­cial out­comes in the tran­si­tion to a ma­chine in­tel­li­gence era
By “policy pro­pos­als” we re­fer not only offi­cial gov­ern­ment doc­u­ments but also plans and op­tions de­vel­oped by pri­vate ac­tors who take an in­ter­est in long-term AI de­vel­op­ments. The desider­ata, there­fore, are also rele­vant to some cor­po­ra­tions, re­search fun­ders, aca­demic or non-profit re­search cen­ters, and var­i­ous other or­ga­ni­za­tions and in­di­vi­d­u­als.

Next are the ac­tual desider­ata. They’re given un­der four head­ings (effi­ciency, al­lo­ca­tion, pop­u­la­tion, and pro­cess), each with 2-4 desider­ata. Each sub­head­ing be­low cor­re­sponds to a policy desider­ata in the pa­per. For each desider­ata I have sum­marised of all the ar­gu­ments and con­sid­er­a­tions in the text that felt new or non-triv­ial to me per­son­ally (e.g. I spent only one sen­tence on the ar­gu­ments for AI safety).

If you want to just read the pa­per’s sum­mary, jump down to page 23 which has a table and sum­marises in their own words.

Effi­ciency Desiderata

Ex­pe­di­tious progress

We should make sure to take ahold of our cos­mic en­dow­ment—and the sooner the bet­ter.

AI safety

Choose poli­cies that leads us to de­velop suffi­cient tech­ni­cal un­der­stand­ing that the AI will do what we ex­pect it to do, and that give these tools to AI builders.

Con­di­tional stabilization

The abil­ity to es­tab­lish a sin­gle­ton, or regime of in­ten­sive global surveillance, or abil­ity to thor­oughly sup­press the spread of dan­ger­ous or info, should we need to use this abil­ity in the face of oth­er­wise catas­trophic global co­or­di­na­tion failures.


Tech­nol­ogy will change rapidly. We don’t want to have to rush reg­u­la­tions through, or al­ter­na­tively take too long to adapt such that the en­vi­ron­ment rad­i­cally changes again. So try to re­duce tur­bu­lence.

Allo­ca­tion Desiderata

Univer­sal benefit

If you force some­one to take a risk, it is only fair that they are com­pen­sated with a share of any re­ward gained. Ex­is­ten­tial risks in­volve ev­ery­one, so ev­ery­one should get pro­por­tional benefit.


Many peo­ple’s val­ues have diminish­ing re­turns to fur­ther re­sources e.g. in­come guaran­tees for all, en­sur­ing all an­i­mals have min­i­mally pos­i­tive lives, aes­thetic pro­jects like pre­serv­ing some art­works, etc. While to­day they must fight for a cut of the small pie, as long as they are granted a non-zero weight­ing in the long-run, they can be satis­fied. 0.00001% of GDP may be more than enough to give all hu­mans a $40k in­come, for ex­am­ple.

This is es­pe­cially good in light of nor­ma­tive un­cer­tainty—as long as we give some weight­ing to var­i­ous val­ues, they will get sa­ti­ated in a ba­sic way in the long-run.


Rea­sons to ex­pect un­usu­ally high con­cen­tra­tion and per­mu­ta­tion of wealth and power:

  • In the mod­ern world, salary is more evenly dis­tributed than cap­i­tal. Su­per­in­tel­li­gent AI is likely to greatly in­crease the fac­tor share of in­come ac­crued from cap­i­tal, lead­ing to mas­sive in­creases in in­equal­ity and in­crease con­cen­tra­tion of wealth.

  • If a small group de­cides how the AI works and its high-level de­ci­sions, they could gain a de­ci­sive strate­gic ad­van­tage and take over the world.

  • If there is rad­i­cal and un­pre­dictable tech­nolog­i­cal change, then it is likely that wealth dis­tri­bu­tion will change rad­i­cally and un­pre­dictably.

  • Au­to­mated se­cu­rity and surveillance sys­tems will help a regime stay al­ive with­out sup­port from the pub­lic or elites—when be­havi­our is more leg­ible it’s eas­ier to pun­ish or con­trol it. This is also likely to at least sus­tain con­cen­tra­tion of wealth and power, but also to in­crease it.

As such we wish to im­ple­ment poli­cies that more sus­tain ex­ist­ing con­cen­tra­tion and dis­tri­bu­tion of wealth and power.

Also of in­ter­est, is (given the high like­li­hood of re­dis­tri­bu­tion, change in con­cen­tra­tion, and gen­eral un­pre­dictable tur­bu­lence) how much we seem to face a global, real-life, Rawlsian veil-of-ig­no­rance. It might be good to set up things like in­surance to make sure ev­ery­one gets some min­i­mum of power and self-de­ter­mi­na­tion in the fu­ture (it seems that peo­ple have diminish­ing re­turns to power—“most peo­ple would much rather be cer­tain to have power over one life (their own) than have a 10% chance of hav­ing power over the lives of ten peo­ple and a 90% chance of hav­ing no power.”

Pop­u­la­tion Desiderata

Mind crime prevention

Four key fac­tors: nov­elty, in­visi­bil­ity, differ­ence, and mag­ni­tude.

  • Novelty and in­visi­bil­ity: Sen­tient digi­tal en­tities may be moral pa­tients. They would be a novel type of mind, and would not ex­hibit many char­ac­ter­is­tics that in­form our moral in­tu­itions—they lack fa­cial ex­pres­sions, phys­i­cal­ity, hu­man speech, and so on, if they are be­ing run in­visi­bly in some micro­pro­ces­sor. This means we should worry about policy mak­ers tak­ing an un­con­scionable moral de­ci­sion.

  • Differ­ence: It is also the case that these minds may be very differ­ent to hu­man or an­i­mal minds, again sub­vert­ing our in­tu­itions about what be­havi­our is nor­ma­tive to­ward them, and in­creas­ing the com­plex­ity of choos­ing sen­si­ble poli­cies here.

  • Mag­ni­tude: It may be in­cred­ibly cheap to cre­ate as many peo­ple as cur­rently ex­ist in a coun­try, mag­nify­ing the con­cerns of the pre­vi­ous three fac­tors. “With high com­pu­ta­tional speed or par­alleliza­tion, a large amount of suffer­ing could be gen­er­ated in a small amount of wall clock time.” This may mean that mind crime is a prin­ci­pal desider­a­tum in AI policy.

Pop­u­la­tion policy

This is a worry about malthu­sian sce­nar­ios (where av­er­age in­come falls to sub­sis­tence lev­els). Han­son has writ­ten about these sce­nar­ios.

This can also un­der­mine democ­racy (“One per­son, one vote”). If a poli­ti­cal fac­tion can in­vest in cre­at­ing more peo­ple, they can cre­ate the biggest vot­ing block. This leaves the fol­low­ing trilemma of op­tions:

  • (i) deny equal votes to all persons

  • (ii) im­pose con­straints on cre­at­ing new persons

  • (iii) ac­cept that vot­ing power be­comes pro­por­tional to abil­ity and will­ing­ness to pay to cre­ate vot­ing sur­ro­gates, re­sult­ing in both eco­nom­i­cally in­effi­cient spend­ing on such sur­ro­gates and the poli­ti­cal marginal­iza­tion of those who lack re­sources or are un­will­ing to spend them on buy­ing vot­ing power

Some in­ter­est­ing forms of (i):

  • Make vot­ing rights some­thing you in­herit, a 1-1 map­ping.

  • Robin Han­son has sug­gested ‘speed-weighted vot­ing’, be­cause faster ems are more costly, so you’d ac­tu­ally have to pay a lot for marginal vot­ers. This still looks like richer peo­ple get­ting a stronger vote, but in-prin­ci­ple puts a much higher cost on it.

Pro­cess Desiderata

First prin­ci­ples think­ing, wis­dom, and tech­ni­cal understanding

Over­all this is an es­pe­cially differ­ent en­vi­ron­ment than usual policy-mak­ing, which means that we will need to be able to re­con­sider fun­da­men­tal as­sump­tions us­ing first-prin­ci­ples think­ing to a greater ex­tent than be­fore and be ex­cep­tion­ally wise (able to get the right an­swer to the most im­por­tant ques­tions while they are sur­rounded by con­fu­sion and mi­s­un­der­stand­ing).

Tech­nolog­i­cal in­no­va­tion is the pri­mary driver of this rad­i­cal new policy land­scape, and so an un­der­stand­ing of the tech­nolo­gies is un­usu­ally helpful.

Speed and decisiveness

In many pos­si­ble fu­tures, his­toric events will be hap­pen­ing faster than global treaties are typ­i­cally ne­go­ti­ated, rat­ified, and im­ple­mented. We need a ca­pac­ity for rapid de­ci­sion-mak­ing and de­ci­sive global im­ple­men­ta­tion.


Many fun­da­men­tal prin­ci­ples will need to be re-ex­am­ined. Some ex­am­ples: le­gi­t­i­macy, con­sent, poli­ti­cal par­ti­ci­pa­tion, ac­countabil­ity.

Vol­un­tary con­sent. Given AIs that are su­per-per­suaders and can con­vince any­one of any­thing, con­sent be­comes a much va­guer and fuzzier con­cept. Per­haps con­sent only counts if the con­sen­tee has an “AI guardian” or “AI ad­vi­sor” of some sort.

Poli­ti­cal par­ti­ci­pa­tion. This norm is typ­i­cally jus­tified on three grounds:

  • Epistemic benefit of in­clud­ing in­for­ma­tion from a max­i­mal di­ver­sity of sources.

  • En­sures all in­ter­ests and prefer­ences are given some weight­ing in the de­ci­sion.

  • In­trin­sic good.


  • The epistemic effect may be­come nega­tive if the AI mak­ing de­ci­sions sits at a suffi­ciently high epistemic van­tage point.

  • AI may be able to con­struct a pro­cess /​ mechanism that ac­counts for all val­ues with­out con­sis­tent in­put from hu­mans.

  • The in­trin­sic good is not changed, though it may not be worth the cost if the above to fac­tors be­come strongly net nega­tive and waste­ful.

The above ex­am­ples, of con­sent and poli­ti­cal par­ti­ci­pa­tion, are not at all clear, but just go to show that there are many un­ques­tioned as­sump­tions in mod­ern poli­ti­cal de­bate that may need ei­ther re­for­mu­la­tion, aban­don­ment, or ex­tra vigilance spent on safe­guard­ing their ex­is­tence into the fu­ture.

Changes since 2016

The pa­per was origi­nally added to Nick Bostrom’s web­site in 2016, and re­ceived an up­date in late 2018 (origi­nal, cur­rent).

The main up­dates as I can see them are:

  • The ad­di­tion of ‘vec­tor field ap­proach’ to the ti­tle and body. It was lightly al­luded to in the ini­tial ver­sion. (I won­der if this was due to lots of feed­back try­ing to fit the pa­per into stan­dard value talk, where it did not want to be.)

  • Chang­ing the head­ing from “Mode” to “Pro­cess”, and flesh­ing out the three desider­ata rather than a sin­gle one called “Re­spon­si­bil­ity and wis­dom”. If you read the ini­tial pa­per, this is the main sec­tion to re-read to get any­thing new.

There have definitely be­ing sig­nifi­cant re-writ­ings of the open­ing sec­tion, and there may be more, but I did not take the time to com­pare them sec­tion-for-sec­tion.

I’ve added some per­sonal re­flec­tion/​up­dates in a com­ment.