Safety regulators: A tool for mitigating technological risk

Cross­posted to the Effec­tive Altru­ism Forum

So far the idea of differ­en­tial tech­nolog­i­cal de­vel­op­ment has been dis­cussed in a way that ei­ther (1) em­pha­sizes ra­tios of progress rates, (2) ra­tios of re­main­ing work, (3) max­i­miz­ing or min­i­miz­ing cor­re­la­tions (for ex­am­ple, min­i­miz­ing the over­lap be­tween the ca­pa­bil­ity to do harm and the de­sire to do so), (4) im­ple­ment­ing safe tech be­fore de­vel­op­ing and im­ple­ment­ing un­safe tech, and (5) the oc­ca­sional niche anal­y­sis (pos­si­bly see also a com­ple­men­tary aside re­lat­ing differ­en­tial out­comes to growth rates in the long run). I haven’t seen much work talk­ing about how var­i­ous ca­pa­bil­ities (a gen­er­al­iza­tion of tech­nol­ogy) may in­ter­act with each other in gen­eral in ways that pre­vent down­side effects (though see also The Vuln­er­a­ble World Hy­poth­e­sis), and I wish to elab­o­rate on this in­ter­ac­tion type.

As tech­nol­ogy im­proves, our ca­pac­ity to do both harm and good in­creases and each ad­di­tional ca­pac­ity un­locks new ca­pac­i­ties that can be im­ple­mented. For ex­am­ple the in­ven­tion of en­g­ines un­locked railroads, which in turn un­locked more effi­cient trade net­works. How­ever, the in­ven­tion of en­g­ines also en­abled the con­struc­tion of mo­bile war ve­hi­cles. How, in an ideal world, could we im­ple­ment ca­pac­i­ties so we get the out­comes we want while cre­at­ing min­i­mal harm and risks in the pro­cess?

What does im­ple­ment­ing a ca­pac­ity do? It en­ables us to change some­thing. A nor­mal pro­gres­sion is:

  1. We have no con­trol over some­thing (e.g. We can­not gen­er­ate elec­tric­ity)

  2. We have con­trol but our choices are noisy and par­tially ran­dom (e.g. We can pro­duce elec­tric sparks on oc­ca­sion but don’t know how to use them)

  3. Our choices are or­ga­nized but there are still down­side effects (e.g. We can chan­nel elec­tric­ity to our homes but oc­ca­sion­ally peo­ple get elec­tro­cuted or fires are started)

  4. Our use of the tech­nol­ogy mostly doesn’t have down­side effects (e.g. We have ca­pa­ble safety reg­u­la­tors (e.g. in­su­la­tion, fuses,...) that al­lows us to min­i­mize fire and elec­tro­cu­tion risks)

The prob­lem is that down­side effects in stages 2 and 3 could over­whelm the value achieved dur­ing those stages and at stage 4, es­pe­cially when con­sid­er­ing pow­er­ful game chang­ing tech­nolo­gies that could lead to ex­is­ten­tial risks.

Even more fun­da­men­tally, as agents in the world we want to avoid shift­ing the ex­pected util­ity in a nega­tive di­rec­tion rel­a­tive to other op­tions (the op­por­tu­nity costs). We want to im­ple­ment new ca­pac­i­ties in the best se­quence, like with any other plan, so as to max­i­mize the value we achieve. The value is a prop­erty of an en­tire plan and the value is harder to think about than just what is the op­ti­mal (or safe) next thing to do (ig­nor­ing what is done af­ter). We wish to make choos­ing which ca­pac­i­ties to de­velop more man­age­able and eas­ier to think about. One way to do this is to make sure that each ca­pac­ity we im­ple­ment is im­me­di­ately an im­prove­ment rel­a­tive to the state we’re in be­fore im­ple­ment­ing it (this sim­plifi­ca­tion is an ex­am­ple of a greedy al­gorithm heuris­tic). What does this sim­plifi­ca­tion im­ply about the se­quence of im­ple­ment­ing ca­pac­i­ties?

This im­plies that what we want to do is to have the ca­pac­i­ties so we may do good with­out the down­side effects and risks of those ca­pac­i­ties. How do we do this? If we’re lucky the ca­pac­ity it­self has no down­side risks, and we’re done. But if we’re not lucky we need to im­ple­ment a reg­u­la­tor on that ca­pac­ity: a safety reg­u­la­tor. Let’s define a safety reg­u­la­tor as a ca­pac­ity that helps con­trol other ca­pac­i­ties to miti­gate their down­side effects. Once a ca­pac­ity has been fully safety reg­u­lated, it is then un­locked and we can im­ple­ment it to pos­i­tive effect.

Some dis­tinc­tions we want to pay at­ten­tion to are then:

  • A ca­pac­ity—a tech­nol­ogy, re­source, or plan that changes the world ei­ther au­tonomously or by en­abling us to use it

  • An im­ple­mented ca­pac­ity—a ca­pac­ity that is implemented

  • An available ca­pac­ity—a ca­pac­ity that can be im­ple­mented immediately

  • An un­locked ca­pac­ity—a ca­pac­ity that is safe and benefi­cial to im­ple­ment given the tech­nolog­i­cal con­text, and is also available

  • A po­ten­tial ca­pac­ity—the set of all pos­si­ble ca­pac­i­ties: those already im­ple­mented, those be­ing worked on, those that are available and those that ex­ists in the­ory but need pre­req­ui­site ca­pac­i­ties to be im­ple­mented first.

  • A safety reg­u­la­tor—a ca­pac­ity that un­locks other ca­pac­i­ties, by miti­gat­ing down­side effects and pos­si­bly pro­vid­ing a pre­req­ui­site. (The safety reg­u­la­tor may or may not be un­locked it­self at this stage—you may need to im­ple­ment other safety reg­u­la­tors or ca­pac­i­ties to un­lock it). Gen­er­ally, safety reg­u­la­tors are some­what spe­cial­ized for the spe­cific ca­pac­i­ties they un­lock.

Run­ning the sug­gested heuris­tic strat­egy then looks like: If a ca­pac­ity is un­locked, then im­ple­ment it; oth­er­wise, im­ple­ment ei­ther an un­locked safety reg­u­la­tor for it first or choose a differ­ent ca­pac­ity to im­ple­ment. We could call this a safety reg­u­lated ca­pac­ity ex­pand­ing feed­back loop. For in­stance, with re­spect to nu­clear re­ac­tions hu­man­ity (1) had the im­ple­mented ca­pac­ity of ac­cess to ra­dioac­tivity, (2) this made available the safety reg­u­la­tor of con­trol­ling chain re­ac­tions, (3) de­ter­min­ing how to con­trol chain re­ac­tions was im­ple­mented (through ex­per­i­men­ta­tion and calcu­la­tion), (4) this un­locked the ca­pac­ity to use chain re­ac­tions (in a con­trol­led fash­ion), (5) and the ca­pac­ity of us­ing chain re­ac­tions was im­ple­mented.

Limi­ta­tions and ex­ten­sions to this method:

  • It’s difficult to tell which of the un­locked ca­pac­i­ties to im­ple­ment at a par­tic­u­lar step. But we’ll as­sume some sort of de­ci­sion pro­cess ex­ists for op­ti­miz­ing that.

  • Ca­pac­i­ties may be good tem­porar­ily, but if other ca­pac­i­ties are not im­ple­mented in time, they may be­come harm­ful (see the loss un­sta­ble states idea).

  • Im­ple­ment­ing ca­pac­i­ties in this way isn’t nec­es­sar­ily op­ti­mal be­cause this ap­proach does not al­low for tem­po­rary bad effects that yield bet­ter re­sults in the long run.

  • Ca­pac­i­ties do not nec­es­sar­ily stay un­locked for­ever due to in­ter­ac­tions with other ca­pac­i­ties that may be im­ple­mented in the in­terim.

  • A locked ca­pac­ity may be net good to im­ple­ment if a safety reg­u­la­tor is im­ple­mented be­fore the down­side effects could take place (this is re­lated to han­dling clue­less­ness).

  • The de­tailed in­ter­ac­tion be­tween ca­pac­i­ties and plan­ning which to de­velop in which or­der re­sem­bles the type of prob­lem the TWEAK plan­ner was built for and it may be one good start­ing point for fur­ther re­search.

  • In more de­tail, how can one ca­pac­ity pre­vent the nega­tive effects of an­other?