RSS

Threat Models

TagLast edit: 20 Apr 2021 21:57 UTC by Quinn

A threat model is a story of how a particular risk (e.g. AI) plays out.

In the AI case, according to Rohin Shah, a threat model is ideally:

Combination of a development model that says how we get AGI and a risk model that says how AGI leads to existential catastrophe.

Why ra­tio­nal­ists should care (more) about free software

RichardJActon23 Jan 2022 17:31 UTC
56 points
37 comments5 min readLW link

My Overview of the AI Align­ment Land­scape: Threat Models

Neel Nanda25 Dec 2021 23:07 UTC
35 points
4 comments28 min readLW link

My Overview of the AI Align­ment Land­scape: A Bird’s Eye View

Neel Nanda15 Dec 2021 23:44 UTC
91 points
8 comments16 min readLW link

Model­ing Failure Modes of High-Level Ma­chine Intelligence

6 Dec 2021 13:54 UTC
53 points
1 comment12 min readLW link

In­ves­ti­gat­ing AI Takeover Scenarios

Sammy Martin17 Sep 2021 18:47 UTC
26 points
1 comment27 min readLW link

Dist­in­guish­ing AI takeover scenarios

8 Sep 2021 16:19 UTC
60 points
11 comments14 min readLW link

Vignettes Work­shop (AI Im­pacts)

Daniel Kokotajlo15 Jun 2021 12:05 UTC
47 points
3 comments1 min readLW link

Sur­vey on AI ex­is­ten­tial risk scenarios

8 Jun 2021 17:12 UTC
56 points
10 comments7 min readLW link

Rogue AGI Em­bod­ies Valuable In­tel­lec­tual Property

3 Jun 2021 20:37 UTC
69 points
9 comments3 min readLW link

Less Real­is­tic Tales of Doom

Mark Xu6 May 2021 23:01 UTC
99 points
13 comments4 min readLW link

Another (outer) al­ign­ment failure story

paulfchristiano7 Apr 2021 20:12 UTC
191 points
37 comments12 min readLW link

What Mul­tipo­lar Failure Looks Like, and Ro­bust Agent-Ag­nos­tic Pro­cesses (RAAPs)

Andrew_Critch31 Mar 2021 23:50 UTC
162 points
59 comments22 min readLW link

My AGI Threat Model: Misal­igned Model-Based RL Agent

Steven Byrnes25 Mar 2021 13:45 UTC
62 points
40 comments16 min readLW link

What Failure Looks Like: Distill­ing the Discussion

Ben Pace29 Jul 2020 21:49 UTC
74 points
12 comments7 min readLW link

What failure looks like

paulfchristiano17 Mar 2019 20:18 UTC
265 points
49 comments8 min readLW link2 reviews