Oversight of Unsafe Systems via Dynamic Safety Envelopes


I had an idea for short-term, non-su­per­hu­man AI safety that I re­cently wrote up and will be post­ing have now posted on Arxiv. This post serves to in­tro­duce the idea, and re­quest feed­back from a more safety-ori­ented group than those that I would oth­er­wise pre­sent the ideas to.

In short, the pa­per tries to adapt a paradigm that Mo­bil­eye has pre­sented for au­tonomous ve­hi­cle safety to a much more gen­eral set­ting. The paradigm is to have a “safety en­velope” that is dic­tated by a sep­a­rate al­gorithm than the policy al­gorithm for driv­ing, set­ting speed- and dis­tance- limits for the ve­hi­cle based on the po­si­tion of ve­hi­cles around it.

For self-driv­ing cares, this works well be­cause there is a physics based model of the sys­tem that can be used to find an al­gorith­mic en­velope. In ar­bi­trary other sys­tems, it works less well, be­cause we don’t have good fun­da­men­tal mod­els for what safe be­hav­ior means. For ex­am­ple, in fi­nan­cial mar­kets there are “cir­cuit break­ers” that func­tion as an op­por­tu­nity for the sys­tem to take a break when some­thing un­ex­pected hap­pens. The val­ues for the cir­cuit break­ers are set via a sim­ple heuris­tic that doesn’t re­late to the dy­nam­ics of the sys­tem in ques­tion. I pro­pose tak­ing a mid­dle path—dy­nam­i­cally learn­ing a safety en­velope.

In build­ing sep­a­rate mod­els for safety and for policy, I think the sys­tem can ad­dress a differ­ent prob­lem be­ing dis­cussed in mil­i­tary and other AI con­texts, which is that “Hu­man-in-the-Loop” is im­pos­si­ble for nor­mal ML sys­tems, since it slows the re­ac­tion time down to the level of hu­man re­ac­tions. The pro­posed paradigm of a safety-en­velope learn­ing sys­tem can be mean­ingfully con­trol­led by hu­mans, be­cause the adap­tive time needed for the sys­tem can be slower than the policy sys­tem that makes the lower level de­ci­sions.

Quick Q&A

1) How do we build heuris­tic safety en­velopes in prac­tice?

This de­pends on the sys­tem in ques­tion. I would be very in­ter­ested in iden­ti­fy­ing do­mains where this class of solu­tion could be im­ple­mented, ei­ther in toy mod­els, or in full sys­tems.

2) Why is this bet­ter than a sys­tem that op­ti­mizes for safety?

The is­sues with bal­anc­ing op­ti­miza­tion for goals ver­sus op­ti­miza­tion for safety can lead to per­verse effects. If the sys­tem op­ti­miz­ing for safety is seg­re­gated, and the policy-en­g­ine is not given ac­cess to it, this should not oc­cur.

This also al­lows the safety sys­tem to be built and mon­i­tored by a reg­u­la­tor, in­stead of by the own­ers of the sys­tem. In the case of Mo­bil­eye’s pro­posed sys­tem, a self-driv­ing car could have the pa­ram­e­ters of the safety en­velope dic­tated by traf­fic au­thor­i­ties, in­stead of need­ing to rely on the car man­u­fac­tur­ers to im­ple­ment sys­tems that drive safely as de­ter­mined by those man­u­fac­tur­ers.

3) Are there any ob­vi­ous short­com­ing to this ap­proach?

Yes. This does not scale to hu­man- or su­per­hu­man- gen­eral in­tel­li­gence, be­cause a sys­tem aware of the con­straints can at­tempt to de­sign poli­cies for avoid­ing them. It is pri­mar­ily in­tended to serve as a stop-gap mea­sure to marginally im­prove the safety of near-term Ma­chine Learn­ing sys­tems.