AI Success Models

TagLast edit: Nov 17, 2021, 11:17 PM by plex

AI Success Models are proposed paths to an existential win via aligned AI. They are (so far) high level overviews and won’t contain all the details, but present at least a sketch of what a full solution might look like. They can be contrasted with threat models, which are stories about how AI might lead to major problems.

Solving the whole AGI control problem, version 0.0001

Steven ByrnesApr 8, 2021, 3:14 PM

63 points

7 comments26 min readLW link

An overview of 11 proposals for building safe advanced AI

evhubMay 29, 2020, 8:38 PM

220 points

37 comments38 min readLW link 2 reviews

A positive case for how we might succeed at prosaic AI alignment

evhubNov 16, 2021, 1:49 AM

81 points

46 comments6 min readLW link

Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios

Evan R. MurphyMay 12, 2022, 8:01 PM

58 points

0 comments59 min readLW link

Conversation with Eliezer: What do you want the system to do?

Orpheus16Jun 25, 2022, 5:36 PM

114 points

38 comments2 min readLW link

[Question] Any further work on AI Safety Success Stories?

KriegerOct 2, 2022, 9:53 AM

8 points

6 comments1 min readLW link

Four visions of Transformative AI success

Steven ByrnesJan 17, 2024, 8:45 PM

112 points

22 comments15 min readLW link

Gradient Descent on the Human Brain

Jozdien and gaspode

Apr 1, 2024, 10:39 PM

59 points

5 comments2 min readLW link

How Would an Utopia-Maximizer Look Like?

Thane RuthenisDec 20, 2023, 8:01 PM

32 points

23 comments10 min readLW link

Conditioning Generative Models for Alignment

JozdienJul 18, 2022, 7:11 AM

60 points

8 comments20 min readLW link

An Open Agency Architecture for Safe Transformative AI

davidadDec 20, 2022, 1:04 PM

80 points

22 comments4 min readLW link

Worrisome misunderstanding of the core issues with AI transition

Roman LeventovJan 18, 2024, 10:05 AM

5 points

2 comments4 min readLW link

Against blanket arguments against interpretability

Dmitry VaintrobJan 22, 2025, 9:46 AM

50 points

4 comments7 min readLW link

Success without dignity: a nearcasting story of avoiding catastrophe by luck

HoldenKarnofskyMar 14, 2023, 7:23 PM

76 points

17 comments15 min readLW link

AI Safety “Success Stories”

Wei DaiSep 7, 2019, 2:54 AM

126 points

27 comments4 min readLW link 1 review

Various Alignment Strategies (and how likely they are to work)

Logan ZoellnerMay 3, 2022, 4:54 PM

84 points

34 comments11 min readLW link

Gaia Network: a practical, incremental pathway to Open Agency Architecture

Roman Leventov and Rafael Kaufmann Nedal

Dec 20, 2023, 5:11 PM

22 points

8 comments16 min readLW link

Getting from an unaligned AGI to an aligned AGI?

Tor Økland BarstadJun 21, 2022, 12:36 PM

13 points

7 comments9 min readLW link

Making it harder for an AGI to “trick” us, with STVs

Tor Økland BarstadJul 9, 2022, 2:42 PM

15 points

5 comments22 min readLW link

AI Safety Endgame Stories

Ivan VendrovSep 28, 2022, 4:58 PM

31 points

11 comments11 min readLW link

An AI-in-a-box success model

azsantoskApr 11, 2022, 10:28 PM

16 points

1 comment10 min readLW link

Gaia Network: An Illustrated Primer

Rafael Kaufmann Nedal and Roman Leventov

Jan 18, 2024, 6:23 PM

3 points

2 comments15 min readLW link

Acceptability Verification: A Research Agenda

David Udell and evhub

Jul 12, 2022, 8:11 PM

50 points

0 comments1 min readLW link

(docs.google.com)

AI Safety via Luck

JozdienApr 1, 2023, 8:13 PM

82 points

7 comments11 min readLW link

[Question] If AGI were coming in a year, what should we do?

MichaelStJulesApr 1, 2022, 12:41 AM

20 points

16 comments1 min readLW link

Alignment with argument-networks and assessment-predictions

Tor Økland BarstadDec 13, 2022, 2:17 AM

10 points

5 comments45 min readLW link

Possible miracles

Orpheus16 and Thomas Larsen

Oct 9, 2022, 6:17 PM

64 points

34 comments8 min readLW link

How Might an Alignment Attractor Look like?

ShmiApr 28, 2022, 6:46 AM

47 points

15 comments2 min readLW link

What success looks like

Marius Hobbhahn, MaxRa, JasperGeh and Yannick_Muehlhaeuser

Jun 28, 2022, 2:38 PM

19 points

4 comments1 min readLW link

(forum.effectivealtruism.org)

[Question] What Does AI Alignment Success Look Like?

ShmiOct 20, 2022, 12:32 AM

23 points

7 comments1 min readLW link

Introduction to the sequence: Interpretability Research for the Most Important Century

Evan R. MurphyMay 12, 2022, 7:59 PM

16 points

0 comments8 min readLW link

Towards Hodge-podge Alignment

Cleo NardoDec 19, 2022, 8:12 PM

95 points

30 comments9 min readLW link

plex Nov 17, 2021, 11:22 PM
1 point
Open to a better name for this. The reason I went with this (rather than Alignment Proposals, Success Stories, or just Success Models) is because I liked capturing this as the mirror of threat models, and including AI feels like a natural category since the other x-risks don’t have clear win conditions unlike threat models which apply widely. I also would like to include this in the AI box in the portal since it feels like a super important tag, and including AI makes that more likely.

AI Suc­cess Models

AI Success Models