The role of epistemic vs. aleatory uncertainty in quantifying AI-Xrisk

(Op­tional) Back­ground: what are epistemic/​aleatory un­cer­tainty?

Epistemic un­cer­tainty is un­cer­tainty about which model of a phe­nomenon is cor­rect. It can be re­duced by learn­ing more about how things work. An ex­am­ple is dis­t­in­guish­ing be­tween a fair coin and a coin that lands heads 75% of the time; these cor­re­spond to two differ­ent mod­els of re­al­ity, and you may have un­cer­tainty over which of these mod­els is cor­rect.

Aleatory un­cer­tainty can be thought of as “in­trin­sic” ran­dom­ness, and as such is ir­re­ducible. An ex­am­ple is the ran­dom­ness in the out­come of a fair coin flip.

In the con­text of ma­chine learn­ing, aleatoric ran­dom­ness can be thought of as ir­re­ducible un­der the mod­el­ling as­sump­tions you’ve made. It may be that there is no such thing as in­trin­sic ran­dom­ness, and ev­ery­thing is de­ter­minis­tic, if you have the right model and enough in­for­ma­tion about the state of the world. But if you’re re­strict­ing your­self to a sim­ple class of mod­els, there will still be many things that ap­pear ran­dom (i.e. un­pre­dictable) to your model.

Here’s the pa­per that in­tro­duced me to these terms: https://​​arxiv.org/​​abs/​​1703.04977

Rele­vance for mod­el­ling AI-Xrisk

I’ve pre­vi­ously claimed some­thing like “If run­ning a sin­gle copy of a given AI sys­tem (let’s call it SketchyBot) for 1 month has a 5% chance of de­stroy­ing the world, then run­ning it for 5 years has a 1 - .95**60 ~= ~95% chance of de­stroy­ing the world”. A similar ar­gu­ment ap­plied to run­ning many copies of SketchyBot in par­allel. But I’m sud­denly sur­prised that no­body has called me out on this (that I re­call), be­cause this rea­son­ing is valid only if this 5% risk is an ex­pres­sion of only aleatoric un­cer­tainty.

In fact, this “5% chance” is best un­der­stood as com­bin­ing epistemic and aleatory un­cer­tainty (by in­te­grat­ing over all pos­si­ble mod­els, ac­cord­ing to their sub­jec­tive prob­a­bil­ity).

Sig­nifi­cantly, epistemic un­cer­tainty doesn’t have this com­pound­ing effect! For ex­am­ple, you could two mod­els of how the world could work, one where we are lucky (L), and SketchyBot is com­pletely safe, and an­other where we are un­lucky (U), and run­ning SketchyBot even for 1 day de­stroys the world (im­me­di­ately). If you be­lieve we have a 5% chance of be­ing in world U and a 95% chance of be­ing in world L, then you can roll the dice and run SketchyBot and not in­cur more than a 5% Xrisk.

More­over, once you’ve ac­tu­ally run SketchyBot for 1 day, if you’re still al­ive, you can con­clude that you were lucky (i.e. we live in world L), and SketchyBot is in fact com­pletely safe. To be clear, I don’t think that ab­sence of ev­i­dence of dan­ger is strong ev­i­dence of safety in ad­vanced AI sys­tems (be­cause of things like treach­er­ous turns), but I’d say it’s a non­triv­ial amount of ev­i­dence. And it seems clear that I was over­es­ti­mat­ing Xrisk by naively com­pound­ing my sub­jec­tive Xrisk es­ti­mates.

Over­all, I think the main take­away is that there are plau­si­ble mod­els in which we ba­si­cally just get lucky, and fairly naive ap­proaches at al­ign­ment just work. I don’t think we should bank on that, but I think it’s worth ask­ing your­self where your un­cer­tainty about Xrisk is com­ing from. Per­son­ally, I still put a lot of weight on mod­els where the kind of ad­vanced AI sys­tems we’re likely to build are not dan­ger­ous by de­fault, but carry some ~con­stant risk of be­com­ing dan­ger­ous for ev­ery sec­ond they are turned on (e.g. by break­ing out of a box, hav­ing crit­i­cal in­sights about the world, in­stan­ti­at­ing in­ner op­ti­miz­ers, etc.). But I also put some weight on more FOOM-y things and at least a lit­tle bit of weight on us get­ting lucky.