# (Op­tional) Back­ground: what are epistemic/​aleatory un­cer­tainty?

Epistemic un­cer­tainty is un­cer­tainty about which model of a phe­nomenon is cor­rect. It can be re­duced by learn­ing more about how things work. An ex­am­ple is dis­t­in­guish­ing be­tween a fair coin and a coin that lands heads 75% of the time; these cor­re­spond to two differ­ent mod­els of re­al­ity, and you may have un­cer­tainty over which of these mod­els is cor­rect.

Aleatory un­cer­tainty can be thought of as “in­trin­sic” ran­dom­ness, and as such is ir­re­ducible. An ex­am­ple is the ran­dom­ness in the out­come of a fair coin flip.

In the con­text of ma­chine learn­ing, aleatoric ran­dom­ness can be thought of as ir­re­ducible un­der the mod­el­ling as­sump­tions you’ve made. It may be that there is no such thing as in­trin­sic ran­dom­ness, and ev­ery­thing is de­ter­minis­tic, if you have the right model and enough in­for­ma­tion about the state of the world. But if you’re re­strict­ing your­self to a sim­ple class of mod­els, there will still be many things that ap­pear ran­dom (i.e. un­pre­dictable) to your model.

Here’s the pa­per that in­tro­duced me to these terms: https://​​arxiv.org/​​abs/​​1703.04977

# Rele­vance for mod­el­ling AI-Xrisk

I’ve pre­vi­ously claimed some­thing like “If run­ning a sin­gle copy of a given AI sys­tem (let’s call it SketchyBot) for 1 month has a 5% chance of de­stroy­ing the world, then run­ning it for 5 years has a 1 - .95**60 ~= ~95% chance of de­stroy­ing the world”. A similar ar­gu­ment ap­plied to run­ning many copies of SketchyBot in par­allel. But I’m sud­denly sur­prised that no­body has called me out on this (that I re­call), be­cause this rea­son­ing is valid only if this 5% risk is an ex­pres­sion of only aleatoric un­cer­tainty.

In fact, this “5% chance” is best un­der­stood as com­bin­ing epistemic and aleatory un­cer­tainty (by in­te­grat­ing over all pos­si­ble mod­els, ac­cord­ing to their sub­jec­tive prob­a­bil­ity).

Sig­nifi­cantly, epistemic un­cer­tainty doesn’t have this com­pound­ing effect! For ex­am­ple, you could two mod­els of how the world could work, one where we are lucky (L), and SketchyBot is com­pletely safe, and an­other where we are un­lucky (U), and run­ning SketchyBot even for 1 day de­stroys the world (im­me­di­ately). If you be­lieve we have a 5% chance of be­ing in world U and a 95% chance of be­ing in world L, then you can roll the dice and run SketchyBot and not in­cur more than a 5% Xrisk.

More­over, once you’ve ac­tu­ally run SketchyBot for 1 day, if you’re still al­ive, you can con­clude that you were lucky (i.e. we live in world L), and SketchyBot is in fact com­pletely safe. To be clear, I don’t think that ab­sence of ev­i­dence of dan­ger is strong ev­i­dence of safety in ad­vanced AI sys­tems (be­cause of things like treach­er­ous turns), but I’d say it’s a non­triv­ial amount of ev­i­dence. And it seems clear that I was over­es­ti­mat­ing Xrisk by naively com­pound­ing my sub­jec­tive Xrisk es­ti­mates.

Over­all, I think the main take­away is that there are plau­si­ble mod­els in which we ba­si­cally just get lucky, and fairly naive ap­proaches at al­ign­ment just work. I don’t think we should bank on that, but I think it’s worth ask­ing your­self where your un­cer­tainty about Xrisk is com­ing from. Per­son­ally, I still put a lot of weight on mod­els where the kind of ad­vanced AI sys­tems we’re likely to build are not dan­ger­ous by de­fault, but carry some ~con­stant risk of be­com­ing dan­ger­ous for ev­ery sec­ond they are turned on (e.g. by break­ing out of a box, hav­ing crit­i­cal in­sights about the world, in­stan­ti­at­ing in­ner op­ti­miz­ers, etc.). But I also put some weight on more FOOM-y things and at least a lit­tle bit of weight on us get­ting lucky.

• If run­ning a sin­gle copy of a given AI sys­tem (let’s call it SketchyBot) for 1 month has a 5% chance of de­stroy­ing the world …

Even given en­tirely aleatoric risk, it’s not clear to me that the com­pound­ing effect is nec­es­sary.

Sup­pose my model for AI risk is a very naive one—when the AI is first turned on, its val­ues are ei­ther com­pletely al­igned (95% chance) or un­al­igned (5% chance). Un­der this model, one month af­ter turn­ing on the AI, I’ll have a 5% chance of be­ing dead, and a 95% chance of be­ing an im­mor­tal demi­god. Another month, year, or decade, and there’s still a 5% chance af­ter a decade that I’m dead, and a 95% chance I’m an im­mor­tal demi­god. Run­ning other copies of the same AI in par­allel doesn’t change that ei­ther.

More gen­er­ally, it seems that any model of AI risk where self.go­ingToDe­stroyTheWorld() is eval­u­ated ex­actly once isn’t sub­ject to those sorts of mul­ti­plica­tive risks. In other words, 1 - .95**60 == we’re all dead only works un­der fairly spe­cific con­di­tions, no epistemic ar­gu­ments required

In fact, the epistemic un­cer­tainty can ac­tu­ally in­crease the to­tal risk if my base is the eval­u­ated once model. Ad­ding other wor­lds where the AI de­cides if it wants to de­stroy the world each morn­ing, or is fun­da­men­tally in­com­pat­i­ble with hu­mans no mat­ter what we try, just moves that in­te­gral over all pos­si­ble mod­els to­wards doom.

• To make this model and lit­tle richer and share some­thing of how I think of it, I tend to think of the risk of any par­tic­u­lar pow­er­ful AI the way I think of risk in de­ploy­ing soft­ware.

I work in site re­li­a­bil­ity/​op­er­a­tions, and so we tend to deal with things we model as hav­ing aleatory un­cer­tainty like hold­ing con­stant a risk that any par­tic­u­lar sys­tem will fail un­ex­pected for some rea­son (hard­ware failure, cos­mic rays, un­ex­pected code ex­e­cu­tion path, etc.), but I also know that most of the risk comes right at the be­gin­ning when I first turn some­thing on (turn on new hard­ware, de­ploy new code, etc.). A very sim­ple model of this is some­thing like where most of the risk of failure hap­pens right at the start and be­yond that there’s lit­tle to no risk of failure, so run­ning for months doesn’t rep­re­sent a 95% risk; al­most all of the 5% risk is eaten up right at the start be­cause the prob­a­bil­ity dis­tri­bu­tion func­tion is shaped such that all the mass is un­der the curve at the be­gin­ning.

• Agree, good point. I’d say it’s aleatoric risk is nec­es­sary to pro­duce com­pound­ing, but not suffi­cient, but maybe I’m still look­ing at this the wrong way.

• The math­e­mat­i­cal prop­erty that you’re look­ing for is in­de­pen­dence. In par­tic­u­lar, your com­pu­ta­tion of 1 - .95**60 would be valid if the prob­a­bil­ity of failure in one month is in­de­pen­dent of the prob­a­bil­ity of failure in any other month.

I don’t think aleatoric risk is nec­es­sary. Con­sider an ML sys­tem that was mag­i­cally trained to max­i­mize CEV (or what­ever you think would make it al­igned), but it is still vuln­er­a­ble to ad­ver­sar­ial ex­am­ples. Sup­pose that ad­ver­sar­ial ex­am­ple ques­tions form 1% of the space of pos­si­ble ques­tions that I could ask. (This is far too high, but what­ever.) It’s likely roughly true that two differ­ent ques­tions that I ask have in­de­pen­dent prob­a­bil­ities of be­ing ad­ver­sar­ial ex­am­ples, since I have no clue what the space of ad­ver­sar­ial ex­am­ples looks like. So the prob­a­bil­ity of failure com­pounds in the num­ber of ques­tions I ask.

Per­son­ally, I still put a lot of weight on mod­els where the kind of ad­vanced AI sys­tems we’re likely to build are not dan­ger­ous by de­fault, but carry some ~con­stant risk of be­com­ing dan­ger­ous for ev­ery sec­ond they are turned on (e.g. by break­ing out of a box, hav­ing crit­i­cal in­sights about the world, in­stan­ti­at­ing in­ner op­ti­miz­ers, etc.).

In this case I think you should es­ti­mate the prob­a­bil­ity of the AI sys­tem ever be­com­ing dan­ger­ous (bear­ing in mind how long it will be op­er­at­ing), not the prob­a­bil­ity per sec­ond. I ex­pect much bet­ter in­tu­itions for the former.

• 1) Yep, in­de­pen­dence.

2) Seems right as well.

3) I think it’s im­por­tant to con­sider “risk per sec­ond”, be­cause

(i) I think many AI sys­tems could even­tu­ally be­come dan­ger­ous, just not on rea­son­able time-scales.

(ii) I think we might want to run AI sys­tems which have the po­ten­tial to be­come dan­ger­ous for limited pe­ri­ods of time.

(iii) If most of the risk is far in the fu­ture, we can hope to be­come more pre­pared in the meanwhile

• I re­ally ap­pre­ci­ate you shar­ing a word for this dis­tinc­tion. I re­mem­ber be­ing in a dis­cus­sion about the pos­si­bil­ity of in­definite lifes­pans way back on the ex­tropi­ans mailing list, and this one per­son was mak­ing an ar­gu­ment about it be­ing im­pos­si­ble due to ac­cu­mu­la­tion of aleatory risk us­ing life in­surance ac­tu­ar­ial mod­els as a start­ing point. Their ar­gu­ment was fine as far as it went, but it cre­ated a lot of con­fu­sion when it seemed there was dis­agree­ment on just where the un­cer­tainty lay, and I re­call try­ing to dis­en­tan­gle that model con­fu­sion lead to a lot of hurt feel­ings. I think hav­ing some term like this to help sep­a­rate the un­cer­tainty about the model, the un­cer­tainty due to ran­dom effects, and the un­cer­tainty about the model im­ply­ing cer­tain level of un­cer­tainty due to ran­dom effects would have helped tremen­dously.