Threat Models

A threat model is a story of how a particular risk (e.g. AI) plays out.

In the AI case, according to Rohin Shah, a threat model is ideally:

Combination of a development model that says how we get AGI and a risk model that says how AGI leads to existential catastrophe.

Another (outer) al­ign­ment failure story

paulfchristiano7 Apr 2021 20:12 UTC
177 points
33 comments12 min readLW link

Rogue AGI Em­bod­ies Valuable In­tel­lec­tual Property

3 Jun 2021 20:37 UTC
69 points
9 comments3 min readLW link

What failure looks like

paulfchristiano17 Mar 2019 20:18 UTC
247 points
48 comments8 min readLW link2 nominations2 reviews

What Mul­tipo­lar Failure Looks Like, and Ro­bust Agent-Ag­nos­tic Pro­cesses (RAAPs)

Andrew_Critch31 Mar 2021 23:50 UTC
145 points
57 comments22 min readLW link

Less Real­is­tic Tales of Doom

Mark Xu6 May 2021 23:01 UTC
94 points
13 comments4 min readLW link

My AGI Threat Model: Misal­igned Model-Based RL Agent

Steven Byrnes25 Mar 2021 13:45 UTC
62 points
40 comments16 min readLW link

Sur­vey on AI ex­is­ten­tial risk scenarios

8 Jun 2021 17:12 UTC
54 points
10 comments7 min readLW link

What Failure Looks Like: Distill­ing the Discussion

Ben Pace29 Jul 2020 21:49 UTC
74 points
12 comments7 min readLW link
