RSS

AI Suc­cess Models

TagLast edit: 17 Nov 2021 23:17 UTC by plex

AI Success Models are proposed paths to an existential win via aligned AI. They are (so far) high level overviews and won’t contain all the details, but present at least a sketch of what a full solution might look like. They can be contrasted with threat models, which are stories about how AI might lead to major problems.

Solv­ing the whole AGI con­trol prob­lem, ver­sion 0.0001

Steven Byrnes8 Apr 2021 15:14 UTC
60 points
7 comments26 min readLW link

A pos­i­tive case for how we might suc­ceed at pro­saic AI alignment

evhub16 Nov 2021 1:49 UTC
79 points
47 comments6 min readLW link

An overview of 11 pro­pos­als for build­ing safe ad­vanced AI

evhub29 May 2020 20:38 UTC
202 points
36 comments38 min readLW link2 reviews

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. Murphy12 May 2022 20:01 UTC
51 points
0 comments59 min readLW link

Con­ver­sa­tion with Eliezer: What do you want the sys­tem to do?

Akash25 Jun 2022 17:36 UTC
120 points
38 comments2 min readLW link

[Question] Any fur­ther work on AI Safety Suc­cess Sto­ries?

Krieger2 Oct 2022 9:53 UTC
7 points
6 comments1 min readLW link

a nar­ra­tive ex­pla­na­tion of the QACI al­ign­ment plan

carado15 Feb 2023 3:28 UTC
42 points
26 comments6 min readLW link
(carado.moe)

AI Safety “Suc­cess Sto­ries”

Wei_Dai7 Sep 2019 2:54 UTC
114 points
27 comments4 min readLW link1 review

Var­i­ous Align­ment Strate­gies (and how likely they are to work)

Logan Zoellner3 May 2022 16:54 UTC
76 points
37 comments11 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

Jozdien18 Jul 2022 7:11 UTC
52 points
8 comments20 min readLW link

An Open Agency Ar­chi­tec­ture for Safe Trans­for­ma­tive AI

davidad20 Dec 2022 13:04 UTC
47 points
18 comments4 min readLW link

for­mal al­ign­ment: what it is, and some proposals

carado29 Jan 2023 11:32 UTC
35 points
2 comments1 min readLW link
(carado.moe)

Suc­cess with­out dig­nity: a nearcast­ing story of avoid­ing catas­tro­phe by luck

HoldenKarnofsky14 Mar 2023 19:23 UTC
50 points
6 comments15 min readLW link

[Question] If AGI were com­ing in a year, what should we do?

MichaelStJules1 Apr 2022 0:41 UTC
20 points
16 comments1 min readLW link

An AI-in-a-box suc­cess model

azsantosk11 Apr 2022 22:28 UTC
16 points
1 comment10 min readLW link

How Might an Align­ment At­trac­tor Look like?

shminux28 Apr 2022 6:46 UTC
47 points
15 comments2 min readLW link

In­tro­duc­tion to the se­quence: In­ter­pretabil­ity Re­search for the Most Im­por­tant Century

Evan R. Murphy12 May 2022 19:59 UTC
16 points
0 comments8 min readLW link

Get­ting from an un­al­igned AGI to an al­igned AGI?

Tor Økland Barstad21 Jun 2022 12:36 UTC
11 points
7 comments9 min readLW link

Mak­ing it harder for an AGI to “trick” us, with STVs

Tor Økland Barstad9 Jul 2022 14:42 UTC
14 points
5 comments22 min readLW link

Ac­cept­abil­ity Ver­ifi­ca­tion: A Re­search Agenda

12 Jul 2022 20:11 UTC
48 points
0 comments1 min readLW link
(docs.google.com)

AI Safety Endgame Stories

Ivan Vendrov28 Sep 2022 16:58 UTC
27 points
11 comments11 min readLW link

[Question] What Does AI Align­ment Suc­cess Look Like?

shminux20 Oct 2022 0:32 UTC
23 points
7 comments1 min readLW link

Align­ment with ar­gu­ment-net­works and as­sess­ment-predictions

Tor Økland Barstad13 Dec 2022 2:17 UTC
7 points
5 comments45 min readLW link

Towards Hodge-podge Alignment

Cleo Nardo19 Dec 2022 20:12 UTC
79 points
28 comments9 min readLW link