RSS

Threat Models

TagLast edit: 27 Aug 2022 18:24 UTC by Multicore

A threat model is a story of how a particular risk (e.g. AI) plays out.

In the AI risk case, according to Rohin Shah, a threat model is ideally:

Combination of a development model that says how we get AGI and a risk model that says how AGI leads to existential catastrophe.

Another (outer) al­ign­ment failure story

paulfchristiano7 Apr 2021 20:12 UTC
223 points
38 comments12 min readLW link1 review

What failure looks like

paulfchristiano17 Mar 2019 20:18 UTC
339 points
49 comments8 min readLW link2 reviews

Dist­in­guish­ing AI takeover scenarios

8 Sep 2021 16:19 UTC
69 points
11 comments14 min readLW link

What Mul­tipo­lar Failure Looks Like, and Ro­bust Agent-Ag­nos­tic Pro­cesses (RAAPs)

Andrew_Critch31 Mar 2021 23:50 UTC
221 points
64 comments22 min readLW link1 review

Vignettes Work­shop (AI Im­pacts)

Daniel Kokotajlo15 Jun 2021 12:05 UTC
47 points
3 comments1 min readLW link

Less Real­is­tic Tales of Doom

Mark Xu6 May 2021 23:01 UTC
110 points
13 comments4 min readLW link

Sur­vey on AI ex­is­ten­tial risk scenarios

8 Jun 2021 17:12 UTC
63 points
11 comments7 min readLW link

In­ves­ti­gat­ing AI Takeover Scenarios

Sammy Martin17 Sep 2021 18:47 UTC
27 points
1 comment27 min readLW link

Rogue AGI Em­bod­ies Valuable In­tel­lec­tual Property

3 Jun 2021 20:37 UTC
70 points
9 comments3 min readLW link

My AGI Threat Model: Misal­igned Model-Based RL Agent

Steven Byrnes25 Mar 2021 13:45 UTC
68 points
40 comments16 min readLW link

What Failure Looks Like: Distill­ing the Discussion

Ben Pace29 Jul 2020 21:49 UTC
80 points
14 comments7 min readLW link

AI Could Defeat All Of Us Combined

HoldenKarnofsky9 Jun 2022 15:50 UTC
170 points
41 comments17 min readLW link
(www.cold-takes.com)

Clar­ify­ing AI X-risk

1 Nov 2022 11:03 UTC
107 points
23 comments4 min readLW link

My Overview of the AI Align­ment Land­scape: A Bird’s Eye View

Neel Nanda15 Dec 2021 23:44 UTC
121 points
9 comments15 min readLW link

A cen­tral AI al­ign­ment prob­lem: ca­pa­bil­ities gen­er­al­iza­tion, and the sharp left turn

So8res15 Jun 2022 13:10 UTC
261 points
48 comments10 min readLW link

Refin­ing the Sharp Left Turn threat model, part 1: claims and mechanisms

12 Aug 2022 15:17 UTC
78 points
3 comments3 min readLW link
(vkrakovna.wordpress.com)

On how var­i­ous plans miss the hard bits of the al­ign­ment challenge

So8res12 Jul 2022 2:49 UTC
270 points
78 comments29 min readLW link

Refin­ing the Sharp Left Turn threat model, part 2: ap­ply­ing al­ign­ment techniques

25 Nov 2022 14:36 UTC
39 points
8 comments6 min readLW link
(vkrakovna.wordpress.com)

Notes on Caution

David Gross1 Dec 2022 3:05 UTC
13 points
0 comments19 min readLW link

AI Ne­o­re­al­ism: a threat model & suc­cess crite­rion for ex­is­ten­tial safety

davidad15 Dec 2022 13:42 UTC
44 points
0 comments3 min readLW link

Con­tra “Strong Co­her­ence”

DragonGod4 Mar 2023 20:05 UTC
38 points
24 comments1 min readLW link

[Linkpost] Some high-level thoughts on the Deep­Mind al­ign­ment team’s strategy

7 Mar 2023 11:55 UTC
108 points
12 comments5 min readLW link
(drive.google.com)

Model­ing Failure Modes of High-Level Ma­chine Intelligence

6 Dec 2021 13:54 UTC
54 points
1 comment12 min readLW link

My Overview of the AI Align­ment Land­scape: Threat Models

Neel Nanda25 Dec 2021 23:07 UTC
50 points
4 comments28 min readLW link

Why ra­tio­nal­ists should care (more) about free software

RichardJActon23 Jan 2022 17:31 UTC
66 points
43 comments5 min readLW link

A Story of AI Risk: In­struc­tGPT-N

peterbarnett26 May 2022 23:22 UTC
24 points
0 comments8 min readLW link

Without spe­cific coun­ter­mea­sures, the eas­iest path to trans­for­ma­tive AI likely leads to AI takeover

Ajeya Cotra18 Jul 2022 19:06 UTC
326 points
91 comments84 min readLW link

How Deadly Will Roughly-Hu­man-Level AGI Be?

David Udell8 Aug 2022 1:59 UTC
12 points
6 comments1 min readLW link

AGI Ruin: A List of Lethalities

Eliezer Yudkowsky5 Jun 2022 22:05 UTC
782 points
657 comments30 min readLW link

Threat Model Liter­a­ture Review

1 Nov 2022 11:03 UTC
67 points
4 comments25 min readLW link

AI X-risk >35% mostly based on a re­cent peer-re­viewed argument

michaelcohen2 Nov 2022 14:26 UTC
36 points
31 comments46 min readLW link

Friendly and Un­friendly AGI are Indistinguishable

ErgoEcho29 Dec 2022 22:13 UTC
−4 points
4 comments4 min readLW link
(neologos.co)

Monthly Doom Ar­gu­ment Threads? Doom Ar­gu­ment Wiki?

LVSN4 Feb 2023 16:59 UTC
3 points
0 comments1 min readLW link

Val­ida­tor mod­els: A sim­ple ap­proach to de­tect­ing goodharting

beren20 Feb 2023 21:32 UTC
14 points
1 comment4 min readLW link

[Question] What‘s in your list of un­solved prob­lems in AI al­ign­ment?

jacquesthibs7 Mar 2023 18:58 UTC
60 points
6 comments1 min readLW link

Deep Deceptiveness

So8res21 Mar 2023 2:51 UTC
189 points
42 comments14 min readLW link