Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
AI Risk
Tag
Last edit:
16 Jul 2020 10:29 UTC
by
Ben Pace
AI Risk
is analysis of the risks associated with building powerful AI systems.
Relevant
New
Old
Superintelligence FAQ
Scott Alexander
20 Sep 2016 19:00 UTC
84
points
11
comments
27
min read
LW
link
What failure looks like
paulfchristiano
17 Mar 2019 20:18 UTC
287
points
49
comments
8
min read
LW
link
2
reviews
Specification gaming examples in AI
Vika
3 Apr 2018 12:30 UTC
39
points
9
comments
1
min read
LW
link
2
reviews
Discussion with Eliezer Yudkowsky on AGI interventions
Rob Bensinger
and
Eliezer Yudkowsky
11 Nov 2021 3:01 UTC
326
points
256
comments
34
min read
LW
link
Intuitions about goal-directed behavior
Rohin Shah
1 Dec 2018 4:25 UTC
50
points
15
comments
6
min read
LW
link
Epistemological Framing for AI Alignment Research
adamShimi
8 Mar 2021 22:05 UTC
53
points
7
comments
9
min read
LW
link
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
5 Jun 2022 22:05 UTC
655
points
624
comments
30
min read
LW
link
What can the principal-agent literature tell us about AI risk?
Alexis Carlier
8 Feb 2020 21:28 UTC
101
points
31
comments
16
min read
LW
link
Developmental Stages of GPTs
orthonormal
26 Jul 2020 22:03 UTC
140
points
74
comments
7
min read
LW
link
1
review
[Question]
Will OpenAI’s work unintentionally increase existential risks related to AI?
adamShimi
11 Aug 2020 18:16 UTC
50
points
56
comments
1
min read
LW
link
Another (outer) alignment failure story
paulfchristiano
7 Apr 2021 20:12 UTC
203
points
37
comments
12
min read
LW
link
How good is humanity at coordination?
Buck
21 Jul 2020 20:01 UTC
77
points
43
comments
3
min read
LW
link
A Gym Gridworld Environment for the Treacherous Turn
Michaël Trazzi
28 Jul 2018 21:27 UTC
71
points
9
comments
3
min read
LW
link
(github.com)
Are minimal circuits deceptive?
evhub
7 Sep 2019 18:11 UTC
56
points
11
comments
8
min read
LW
link
Soft takeoff can still lead to decisive strategic advantage
Daniel Kokotajlo
23 Aug 2019 16:39 UTC
117
points
46
comments
8
min read
LW
link
4
reviews
Should we postpone AGI until we reach safety?
otto.barten
18 Nov 2020 15:43 UTC
26
points
36
comments
3
min read
LW
link
On Solving Problems Before They Appear: The Weird Epistemologies of Alignment
adamShimi
11 Oct 2021 8:20 UTC
93
points
11
comments
15
min read
LW
link
DL towards the unaligned Recursive Self-Optimization attractor
jacob_cannell
18 Dec 2021 2:15 UTC
32
points
22
comments
4
min read
LW
link
Truthful LMs as a warm-up for aligned AGI
Jacob_Hilton
17 Jan 2022 16:49 UTC
64
points
14
comments
13
min read
LW
link
MIRI announces new “Death With Dignity” strategy
Eliezer Yudkowsky
2 Apr 2022 0:43 UTC
324
points
518
comments
18
min read
LW
link
AI Could Defeat All Of Us Combined
HoldenKarnofsky
9 Jun 2022 15:50 UTC
162
points
27
comments
17
min read
LW
link
(www.cold-takes.com)
Critiquing “What failure looks like”
Grue_Slinky
27 Dec 2019 23:59 UTC
35
points
6
comments
3
min read
LW
link
The Main Sources of AI Risk?
Daniel Kokotajlo
and
Wei_Dai
21 Mar 2019 18:28 UTC
78
points
22
comments
2
min read
LW
link
Clarifying some key hypotheses in AI alignment
Ben Cottier
and
Rohin Shah
15 Aug 2019 21:29 UTC
78
points
12
comments
9
min read
LW
link
“Taking AI Risk Seriously” (thoughts by Critch)
Raemon
29 Jan 2018 9:27 UTC
111
points
68
comments
13
min read
LW
link
Some conceptual highlights from “Disjunctive Scenarios of Catastrophic AI Risk”
Kaj_Sotala
12 Feb 2018 12:30 UTC
33
points
4
comments
6
min read
LW
link
(kajsotala.fi)
Non-Adversarial Goodhart and AI Risks
Davidmanheim
27 Mar 2018 1:39 UTC
22
points
11
comments
6
min read
LW
link
Six AI Risk/Strategy Ideas
Wei_Dai
27 Aug 2019 0:40 UTC
63
points
18
comments
4
min read
LW
link
1
review
[Question]
Did AI pioneers not worry much about AI risks?
lisperati
9 Feb 2020 19:58 UTC
42
points
9
comments
1
min read
LW
link
Some disjunctive reasons for urgency on AI risk
Wei_Dai
15 Feb 2019 20:43 UTC
36
points
24
comments
1
min read
LW
link
Drexler on AI Risk
PeterMcCluskey
1 Feb 2019 5:11 UTC
34
points
10
comments
9
min read
LW
link
(www.bayesianinvestor.com)
A shift in arguments for AI risk
Richard_Ngo
28 May 2019 13:47 UTC
32
points
7
comments
1
min read
LW
link
(fragile-credences.github.io)
Disentangling arguments for the importance of AI safety
Richard_Ngo
21 Jan 2019 12:41 UTC
128
points
23
comments
8
min read
LW
link
AI Safety “Success Stories”
Wei_Dai
7 Sep 2019 2:54 UTC
107
points
27
comments
4
min read
LW
link
1
review
Debate on Instrumental Convergence between LeCun, Russell, Bengio, Zador, and More
Ben Pace
4 Oct 2019 4:08 UTC
191
points
57
comments
15
min read
LW
link
2
reviews
[AN #80]: Why AI risk might be solved without additional intervention from longtermists
Rohin Shah
2 Jan 2020 18:20 UTC
35
points
94
comments
10
min read
LW
link
(mailchi.mp)
The strategy-stealing assumption
paulfchristiano
16 Sep 2019 15:23 UTC
70
points
46
comments
12
min read
LW
link
3
reviews
Thinking soberly about the context and consequences of Friendly AI
Mitchell_Porter
16 Oct 2012 4:33 UTC
20
points
39
comments
1
min read
LW
link
Announcement: AI alignment prize winners and next round
cousin_it
15 Jan 2018 14:33 UTC
80
points
68
comments
2
min read
LW
link
What Failure Looks Like: Distilling the Discussion
Ben Pace
29 Jul 2020 21:49 UTC
74
points
12
comments
7
min read
LW
link
Uber Self-Driving Crash
jefftk
7 Nov 2019 15:00 UTC
110
points
1
comment
2
min read
LW
link
(www.jefftk.com)
Reply to Holden on ‘Tool AI’
Eliezer Yudkowsky
12 Jun 2012 18:00 UTC
151
points
357
comments
17
min read
LW
link
Stanford Encyclopedia of Philosophy on AI ethics and superintelligence
Kaj_Sotala
2 May 2020 7:35 UTC
43
points
19
comments
7
min read
LW
link
(plato.stanford.edu)
AGI Safety Literature Review (Everitt, Lea & Hutter 2018)
Kaj_Sotala
4 May 2018 8:56 UTC
13
points
1
comment
1
min read
LW
link
(arxiv.org)
Response to Oren Etzioni’s “How to know if artificial intelligence is about to destroy civilization”
Daniel Kokotajlo
27 Feb 2020 18:10 UTC
27
points
5
comments
8
min read
LW
link
Why don’t singularitarians bet on the creation of AGI by buying stocks?
John_Maxwell
11 Mar 2020 16:27 UTC
43
points
20
comments
4
min read
LW
link
The problem/solution matrix: Calculating the probability of AI safety “on the back of an envelope”
John_Maxwell
20 Oct 2019 8:03 UTC
22
points
4
comments
2
min read
LW
link
Three Stories for How AGI Comes Before FAI
John_Maxwell
17 Sep 2019 23:26 UTC
27
points
8
comments
6
min read
LW
link
Brainstorming additional AI risk reduction ideas
John_Maxwell
14 Jun 2012 7:55 UTC
19
points
37
comments
1
min read
LW
link
AI Alignment 2018-19 Review
Rohin Shah
28 Jan 2020 2:19 UTC
125
points
6
comments
35
min read
LW
link
The Fusion Power Generator Scenario
johnswentworth
8 Aug 2020 18:31 UTC
132
points
29
comments
3
min read
LW
link
A guide to Iterated Amplification & Debate
Rafael Harth
15 Nov 2020 17:14 UTC
66
points
9
comments
15
min read
LW
link
Work on Security Instead of Friendliness?
Wei_Dai
21 Jul 2012 18:28 UTC
51
points
107
comments
2
min read
LW
link
An unaligned benchmark
paulfchristiano
17 Nov 2018 15:51 UTC
31
points
0
comments
9
min read
LW
link
Artificial Intelligence: A Modern Approach (4th edition) on the Alignment Problem
Zack_M_Davis
17 Sep 2020 2:23 UTC
72
points
12
comments
5
min read
LW
link
(aima.cs.berkeley.edu)
Clarifying “What failure looks like”
Sam Clarke
20 Sep 2020 20:40 UTC
88
points
14
comments
17
min read
LW
link
Relaxed adversarial training for inner alignment
evhub
10 Sep 2019 23:03 UTC
57
points
22
comments
27
min read
LW
link
An overview of 11 proposals for building safe advanced AI
evhub
29 May 2020 20:38 UTC
184
points
34
comments
38
min read
LW
link
2
reviews
Risks from Learned Optimization: Introduction
evhub
,
Chris van Merwijk
,
vlad_m
,
Joar Skalse
and
Scott Garrabrant
31 May 2019 23:44 UTC
156
points
42
comments
12
min read
LW
link
3
reviews
Risks from Learned Optimization: Conclusion and Related Work
evhub
,
Chris van Merwijk
,
vlad_m
,
Joar Skalse
and
Scott Garrabrant
7 Jun 2019 19:53 UTC
73
points
4
comments
6
min read
LW
link
Deceptive Alignment
evhub
,
Chris van Merwijk
,
vlad_m
,
Joar Skalse
and
Scott Garrabrant
5 Jun 2019 20:16 UTC
84
points
11
comments
17
min read
LW
link
The Inner Alignment Problem
evhub
,
Chris van Merwijk
,
vlad_m
,
Joar Skalse
and
Scott Garrabrant
4 Jun 2019 1:20 UTC
90
points
17
comments
13
min read
LW
link
Conditions for Mesa-Optimization
evhub
,
Chris van Merwijk
,
vlad_m
,
Joar Skalse
and
Scott Garrabrant
1 Jun 2019 20:52 UTC
68
points
47
comments
12
min read
LW
link
AI risk hub in Singapore?
Daniel Kokotajlo
29 Oct 2020 11:45 UTC
51
points
18
comments
4
min read
LW
link
Thoughts on Robin Hanson’s AI Impacts interview
Steven Byrnes
24 Nov 2019 1:40 UTC
25
points
3
comments
7
min read
LW
link
The AI Safety Game (UPDATED)
Daniel Kokotajlo
5 Dec 2020 10:27 UTC
41
points
9
comments
3
min read
LW
link
[Question]
Suggestions of posts on the AF to review
adamShimi
16 Feb 2021 12:40 UTC
56
points
20
comments
1
min read
LW
link
Google’s Ethical AI team and AI Safety
magfrump
20 Feb 2021 9:42 UTC
12
points
16
comments
7
min read
LW
link
Behavioral Sufficient Statistics for Goal-Directedness
adamShimi
11 Mar 2021 15:01 UTC
21
points
12
comments
9
min read
LW
link
Review of “Fun with +12 OOMs of Compute”
adamShimi
,
Joe_Collman
and
Gyrodiot
28 Mar 2021 14:55 UTC
57
points
20
comments
8
min read
LW
link
April drafts
AI Impacts
1 Apr 2021 18:10 UTC
49
points
2
comments
1
min read
LW
link
(aiimpacts.org)
25 Min Talk on MetaEthical.AI with Questions from Stuart Armstrong
June Ku
29 Apr 2021 15:38 UTC
21
points
7
comments
1
min read
LW
link
Less Realistic Tales of Doom
Mark Xu
6 May 2021 23:01 UTC
100
points
13
comments
4
min read
LW
link
Rogue AGI Embodies Valuable Intellectual Property
Mark Xu
and
CarlShulman
3 Jun 2021 20:37 UTC
69
points
9
comments
3
min read
LW
link
Environmental Structure Can Cause Instrumental Convergence
TurnTrout
22 Jun 2021 22:26 UTC
71
points
44
comments
16
min read
LW
link
(arxiv.org)
Alex Turner’s Research, Comprehensive Information Gathering
adamShimi
23 Jun 2021 9:44 UTC
15
points
3
comments
3
min read
LW
link
Sam Altman and Ezra Klein on the AI Revolution
Zack_M_Davis
27 Jun 2021 4:53 UTC
38
points
17
comments
1
min read
LW
link
(www.nytimes.com)
Approaches to gradient hacking
adamShimi
14 Aug 2021 15:16 UTC
16
points
7
comments
8
min read
LW
link
[Question]
What are good alignment conference papers?
adamShimi
28 Aug 2021 13:35 UTC
12
points
2
comments
1
min read
LW
link
Why the technological singularity by AGI may never happen
hippke
3 Sep 2021 14:19 UTC
5
points
14
comments
1
min read
LW
link
[Question]
Conditional on the first AGI being aligned correctly, is a good outcome even still likely?
iamthouthouarti
6 Sep 2021 17:30 UTC
2
points
1
comment
1
min read
LW
link
Bayeswatch 7: Wildfire
lsusr
8 Sep 2021 5:35 UTC
47
points
6
comments
3
min read
LW
link
[Book Review] “The Alignment Problem” by Brian Christian
lsusr
20 Sep 2021 6:36 UTC
69
points
16
comments
6
min read
LW
link
Is progress in ML-assisted theorem-proving beneficial?
MakoYass
28 Sep 2021 1:54 UTC
10
points
2
comments
1
min read
LW
link
Interview with Skynet
lsusr
30 Sep 2021 2:20 UTC
49
points
1
comment
2
min read
LW
link
Epistemic Strategies of Selection Theorems
adamShimi
18 Oct 2021 8:57 UTC
32
points
1
comment
12
min read
LW
link
Epistemic Strategies of Safety-Capabilities Tradeoffs
adamShimi
22 Oct 2021 8:22 UTC
5
points
0
comments
6
min read
LW
link
Drug addicts and deceptively aligned agents—a comparative analysis
Jan
5 Nov 2021 21:42 UTC
41
points
2
comments
12
min read
LW
link
(universalprior.substack.com)
Using Brain-Computer Interfaces to get more data for AI alignment
Robbo
7 Nov 2021 0:00 UTC
37
points
10
comments
7
min read
LW
link
Using blinders to help you see things for what they are
adamzerner
11 Nov 2021 7:07 UTC
13
points
2
comments
2
min read
LW
link
My current uncertainties regarding AI, alignment, and the end of the world
dominicq
14 Nov 2021 14:08 UTC
2
points
3
comments
2
min read
LW
link
Ngo and Yudkowsky on alignment difficulty
Eliezer Yudkowsky
and
Richard_Ngo
15 Nov 2021 20:31 UTC
231
points
143
comments
99
min read
LW
link
Applications for AI Safety Camp 2022 Now Open!
adamShimi
17 Nov 2021 21:42 UTC
47
points
3
comments
1
min read
LW
link
[Question]
Does the Structure of an algorithm matter for AI Risk and/or consciousness?
Logan Zoellner
3 Dec 2021 18:31 UTC
7
points
5
comments
1
min read
LW
link
Framing approaches to alignment and the hard problem of AI cognition
ryan_greenblatt
15 Dec 2021 19:06 UTC
6
points
15
comments
27
min read
LW
link
Some abstract, non-technical reasons to be non-maximally-pessimistic about AI alignment
Rob Bensinger
12 Dec 2021 2:08 UTC
55
points
37
comments
6
min read
LW
link
Summary of the Acausal Attack Issue for AIXI
Diffractor
13 Dec 2021 8:16 UTC
13
points
6
comments
4
min read
LW
link
My Overview of the AI Alignment Landscape: A Bird’s Eye View
Neel Nanda
15 Dec 2021 23:44 UTC
104
points
9
comments
16
min read
LW
link
My Overview of the AI Alignment Landscape: Threat Models
Neel Nanda
25 Dec 2021 23:07 UTC
38
points
4
comments
28
min read
LW
link
AI Fire Alarm Scenarios
PeterMcCluskey
28 Dec 2021 2:20 UTC
10
points
0
comments
6
min read
LW
link
(www.bayesianinvestor.com)
More Is Different for AI
jsteinhardt
4 Jan 2022 19:30 UTC
135
points
21
comments
3
min read
LW
link
(bounded-regret.ghost.io)
Challenges with Breaking into MIRI-Style Research
Chris_Leong
17 Jan 2022 9:23 UTC
69
points
15
comments
3
min read
LW
link
Thoughts on AGI safety from the top
jylin04
2 Feb 2022 20:06 UTC
21
points
2
comments
32
min read
LW
link
Paradigm-building from first principles: Effective altruism, AGI, and alignment
Cameron Berg
8 Feb 2022 16:12 UTC
24
points
5
comments
14
min read
LW
link
Paradigm-building: Introduction
Cameron Berg
8 Feb 2022 0:06 UTC
24
points
0
comments
2
min read
LW
link
How I Formed My Own Views About AI Safety
Neel Nanda
27 Feb 2022 18:50 UTC
62
points
5
comments
13
min read
LW
link
(www.neelnanda.io)
[Question]
Would (myopic) general public good producers significantly accelerate the development of AGI?
MakoYass
2 Mar 2022 23:47 UTC
24
points
10
comments
1
min read
LW
link
AXRP Episode 13 - First Principles of AGI Safety with Richard Ngo
DanielFilan
31 Mar 2022 5:20 UTC
24
points
1
comment
48
min read
LW
link
[RETRACTED] It’s time for EA leadership to pull the short-timelines fire alarm.
Not Relevant
8 Apr 2022 16:07 UTC
111
points
164
comments
4
min read
LW
link
AMA Conjecture, A New Alignment Startup
adamShimi
9 Apr 2022 9:43 UTC
43
points
40
comments
1
min read
LW
link
Why I’m Worried About AI
peterbarnett
23 May 2022 21:13 UTC
21
points
2
comments
12
min read
LW
link
Complex Systems for AI Safety [Pragmatic AI Safety #3]
Dan Hendrycks
and
ThomasWoodside
24 May 2022 0:00 UTC
30
points
0
comments
21
min read
LW
link
Will working here advance AGI? Help us not destroy the world!
Yonatan Cale
29 May 2022 11:42 UTC
29
points
43
comments
1
min read
LW
link
Distilled—AGI Safety from First Principles
Harrison G
29 May 2022 0:57 UTC
4
points
0
comments
14
min read
LW
link
Perform Tractable Research While Avoiding Capabilities Externalities [Pragmatic AI Safety #4]
Dan Hendrycks
and
ThomasWoodside
30 May 2022 20:25 UTC
33
points
3
comments
25
min read
LW
link
Artificial Intelligence Safety for the Averagely Intelligent
Salsabila Mahdi
1 Jun 2022 6:24 UTC
5
points
2
comments
1
min read
LW
link
Confused why a “capabilities research is good for alignment progress” position isn’t discussed more
Kaj_Sotala
2 Jun 2022 21:41 UTC
126
points
26
comments
4
min read
LW
link
I’m trying out “asteroid mindset”
Alex_Altair
3 Jun 2022 13:35 UTC
83
points
5
comments
4
min read
LW
link
Epistemological Vigilance for Alignment
adamShimi
6 Jun 2022 0:27 UTC
45
points
10
comments
11
min read
LW
link
A Quick Guide to Confronting Doom
Ruby
13 Apr 2022 19:30 UTC
222
points
36
comments
2
min read
LW
link
Why I don’t believe in doom
mukashi
7 Jun 2022 23:49 UTC
6
points
30
comments
4
min read
LW
link
Open Problems in AI X-Risk [PAIS #5]
Dan Hendrycks
and
ThomasWoodside
10 Jun 2022 2:08 UTC
41
points
3
comments
35
min read
LW
link
Another plausible scenario of AI risk: AI builds military infrastructure while collaborating with humans, defects later.
avturchin
10 Jun 2022 17:24 UTC
10
points
2
comments
1
min read
LW
link
How dangerous is human-level AI?
Alex_Altair
10 Jun 2022 17:38 UTC
20
points
4
comments
8
min read
LW
link
On A List of Lethalities
Zvi
13 Jun 2022 12:30 UTC
147
points
47
comments
54
min read
LW
link
(thezvi.wordpress.com)
Continuity Assumptions
Jan_Kulveit
13 Jun 2022 21:31 UTC
24
points
13
comments
4
min read
LW
link
Slow motion videos as AI risk intuition pumps
Andrew_Critch
14 Jun 2022 19:31 UTC
193
points
35
comments
2
min read
LW
link
Alignment Risk Doesn’t Require Superintelligence
JustisMills
15 Jun 2022 3:12 UTC
35
points
4
comments
2
min read
LW
link
[Question]
Has there been any work on attempting to use Pascal’s Mugging to make an AGI behave?
Chris_Leong
15 Jun 2022 8:33 UTC
7
points
17
comments
1
min read
LW
link
Where I agree and disagree with Eliezer
paulfchristiano
19 Jun 2022 19:15 UTC
644
points
169
comments
20
min read
LW
link
Are we there yet?
theflowerpot
20 Jun 2022 11:19 UTC
2
points
2
comments
1
min read
LW
link
Confusion about neuroscience/cognitive science as a danger for AI Alignment
Samuel Nellessen
22 Jun 2022 17:59 UTC
2
points
1
comment
3
min read
LW
link
(snellessen.com)
Access to AI: a human right?
dmtea
25 Jul 2020 9:38 UTC
5
points
3
comments
2
min read
LW
link
Agentic Language Model Memes
FactorialCode
1 Aug 2020 18:03 UTC
16
points
1
comment
2
min read
LW
link
Conversation with Paul Christiano
abergal
11 Sep 2019 23:20 UTC
44
points
6
comments
30
min read
LW
link
(aiimpacts.org)
Transcription of Eliezer’s January 2010 video Q&A
curiousepic
14 Nov 2011 17:02 UTC
110
points
9
comments
56
min read
LW
link
Responses to Catastrophic AGI Risk: A Survey
lukeprog
8 Jul 2013 14:33 UTC
17
points
8
comments
1
min read
LW
link
How can I reduce existential risk from AI?
lukeprog
13 Nov 2012 21:56 UTC
63
points
92
comments
8
min read
LW
link
Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”
capybaralet
6 Feb 2019 19:09 UTC
25
points
17
comments
1
min read
LW
link
Reframing misaligned AGI’s: well-intentioned non-neurotypical assistants
zhukeepa
1 Apr 2018 1:22 UTC
46
points
14
comments
2
min read
LW
link
When is unaligned AI morally valuable?
paulfchristiano
25 May 2018 1:57 UTC
61
points
52
comments
10
min read
LW
link
Introducing the AI Alignment Forum (FAQ)
habryka
,
Ben Pace
,
Raemon
and
jimrandomh
29 Oct 2018 21:07 UTC
86
points
8
comments
6
min read
LW
link
Swimming Upstream: A Case Study in Instrumental Rationality
TurnTrout
3 Jun 2018 3:16 UTC
68
points
7
comments
8
min read
LW
link
Current AI Safety Roles for Software Engineers
ozziegooen
9 Nov 2018 20:57 UTC
70
points
9
comments
4
min read
LW
link
[Question]
Why is so much discussion happening in private Google Docs?
Wei_Dai
12 Jan 2019 2:19 UTC
100
points
22
comments
1
min read
LW
link
Problems in AI Alignment that philosophers could potentially contribute to
Wei_Dai
17 Aug 2019 17:38 UTC
74
points
14
comments
2
min read
LW
link
Two Neglected Problems in Human-AI Safety
Wei_Dai
16 Dec 2018 22:13 UTC
82
points
24
comments
2
min read
LW
link
Announcement: AI alignment prize round 4 winners
cousin_it
20 Jan 2019 14:46 UTC
74
points
41
comments
1
min read
LW
link
Soon: a weekly AI Safety prerequisites module on LessWrong
null
30 Apr 2018 13:23 UTC
35
points
10
comments
1
min read
LW
link
And the AI would have got away with it too, if...
Stuart_Armstrong
22 May 2019 21:35 UTC
75
points
7
comments
1
min read
LW
link
2017 AI Safety Literature Review and Charity Comparison
Larks
24 Dec 2017 18:52 UTC
41
points
5
comments
23
min read
LW
link
Should ethicists be inside or outside a profession?
Eliezer Yudkowsky
12 Dec 2018 1:40 UTC
83
points
6
comments
9
min read
LW
link
I Vouch For MIRI
Zvi
17 Dec 2017 17:50 UTC
36
points
9
comments
5
min read
LW
link
(thezvi.wordpress.com)
Beware of black boxes in AI alignment research
cousin_it
18 Jan 2018 15:07 UTC
39
points
10
comments
1
min read
LW
link
AI Alignment Prize: Round 2 due March 31, 2018
Zvi
12 Mar 2018 12:10 UTC
28
points
2
comments
3
min read
LW
link
(thezvi.wordpress.com)
Three AI Safety Related Ideas
Wei_Dai
13 Dec 2018 21:32 UTC
68
points
38
comments
2
min read
LW
link
A rant against robots
Lê Nguyên Hoang
14 Jan 2020 22:03 UTC
64
points
7
comments
5
min read
LW
link
Opportunities for individual donors in AI safety
Alex Flint
31 Mar 2018 18:37 UTC
27
points
3
comments
11
min read
LW
link
But exactly how complex and fragile?
KatjaGrace
3 Nov 2019 18:20 UTC
73
points
32
comments
3
min read
LW
link
1
review
(meteuphoric.com)
Course recommendations for Friendliness researchers
Louie
9 Jan 2013 14:33 UTC
96
points
112
comments
10
min read
LW
link
AI Safety Research Camp—Project Proposal
David_Kristoffersson
2 Feb 2018 4:25 UTC
29
points
11
comments
8
min read
LW
link
AI Summer Fellows Program
colm
21 Mar 2018 15:32 UTC
21
points
0
comments
1
min read
LW
link
The genie knows, but doesn’t care
Rob Bensinger
6 Sep 2013 6:42 UTC
89
points
519
comments
8
min read
LW
link
Alignment Newsletter #13: 07/02/18
Rohin Shah
2 Jul 2018 16:10 UTC
70
points
12
comments
8
min read
LW
link
(mailchi.mp)
An Increasingly Manipulative Newsfeed
Michaël Trazzi
1 Jul 2019 15:26 UTC
61
points
16
comments
5
min read
LW
link
The simple picture on AI safety
Alex Flint
27 May 2018 19:43 UTC
28
points
10
comments
2
min read
LW
link
Elon Musk donates $10M to the Future of Life Institute to keep AI beneficial
Paul Crowley
15 Jan 2015 16:33 UTC
78
points
52
comments
1
min read
LW
link
Strategic implications of AIs’ ability to coordinate at low cost, for example by merging
Wei_Dai
25 Apr 2019 5:08 UTC
60
points
45
comments
2
min read
LW
link
1
review
Modeling AGI Safety Frameworks with Causal Influence Diagrams
Ramana Kumar
21 Jun 2019 12:50 UTC
43
points
6
comments
1
min read
LW
link
(arxiv.org)
Henry Kissinger: AI Could Mean the End of Human History
ESRogs
15 May 2018 20:11 UTC
17
points
12
comments
1
min read
LW
link
(www.theatlantic.com)
Toy model of the AI control problem: animated version
Stuart_Armstrong
10 Oct 2017 11:06 UTC
25
points
8
comments
1
min read
LW
link
A Visualization of Nick Bostrom’s Superintelligence
[deleted]
23 Jul 2014 0:24 UTC
62
points
28
comments
3
min read
LW
link
AI Alignment Research Overview (by Jacob Steinhardt)
Ben Pace
6 Nov 2019 19:24 UTC
43
points
0
comments
7
min read
LW
link
(docs.google.com)
A general model of safety-oriented AI development
Wei_Dai
11 Jun 2018 21:00 UTC
65
points
8
comments
1
min read
LW
link
Counterfactual Oracles = online supervised learning with random selection of training episodes
Wei_Dai
10 Sep 2019 8:29 UTC
44
points
26
comments
3
min read
LW
link
Siren worlds and the perils of over-optimised search
Stuart_Armstrong
7 Apr 2014 11:00 UTC
73
points
417
comments
7
min read
LW
link
Top 9+2 myths about AI risk
Stuart_Armstrong
29 Jun 2015 20:41 UTC
65
points
46
comments
2
min read
LW
link
Rohin Shah on reasons for AI optimism
abergal
31 Oct 2019 12:10 UTC
40
points
58
comments
1
min read
LW
link
(aiimpacts.org)
Plausibly, almost every powerful algorithm would be manipulative
Stuart_Armstrong
6 Feb 2020 11:50 UTC
38
points
25
comments
3
min read
LW
link
The Magnitude of His Own Folly
Eliezer Yudkowsky
30 Sep 2008 11:31 UTC
74
points
128
comments
6
min read
LW
link
AI alignment landscape
paulfchristiano
13 Oct 2019 2:10 UTC
40
points
3
comments
1
min read
LW
link
(ai-alignment.com)
Launched: Friendship is Optimal
iceman
15 Nov 2012 4:57 UTC
72
points
32
comments
1
min read
LW
link
Friendship is Optimal: A My Little Pony fanfic about an optimization process
iceman
8 Sep 2012 6:16 UTC
109
points
152
comments
1
min read
LW
link
Do Earths with slower economic growth have a better chance at FAI?
Eliezer Yudkowsky
12 Jun 2013 19:54 UTC
54
points
176
comments
4
min read
LW
link
Idea: Open Access AI Safety Journal
G Gordon Worley III
23 Mar 2018 18:27 UTC
28
points
11
comments
1
min read
LW
link
G.K. Chesterton On AI Risk
Scott Alexander
1 Apr 2017 19:00 UTC
15
points
0
comments
7
min read
LW
link
The Hidden Complexity of Wishes
Eliezer Yudkowsky
24 Nov 2007 0:12 UTC
130
points
135
comments
7
min read
LW
link
The Friendly AI Game
bentarm
15 Mar 2011 16:45 UTC
50
points
178
comments
1
min read
LW
link
Q&A with Jürgen Schmidhuber on risks from AI
XiXiDu
15 Jun 2011 15:51 UTC
57
points
45
comments
4
min read
LW
link
[Question]
What should an Einstein-like figure in Machine Learning do?
Razied
5 Aug 2020 23:52 UTC
3
points
3
comments
1
min read
LW
link
Takeaways from safety by default interviews
AI Impacts
and
abergal
3 Apr 2020 17:20 UTC
23
points
2
comments
13
min read
LW
link
(aiimpacts.org)
Field-Building and Deep Models
Ben Pace
13 Jan 2018 21:16 UTC
21
points
12
comments
4
min read
LW
link
Critique my Model: The EV of AGI to Selfish Individuals
ozziegooen
8 Apr 2018 20:04 UTC
19
points
9
comments
4
min read
LW
link
‘Dumb’ AI observes and manipulates controllers
Stuart_Armstrong
13 Jan 2015 13:35 UTC
52
points
19
comments
2
min read
LW
link
2019 AI Alignment Literature Review and Charity Comparison
Larks
19 Dec 2019 3:00 UTC
130
points
18
comments
62
min read
LW
link
Book review: Architects of Intelligence by Martin Ford (2018)
ofer
11 Aug 2020 17:30 UTC
15
points
0
comments
2
min read
LW
link
Qualitative Strategies of Friendliness
Eliezer Yudkowsky
30 Aug 2008 2:12 UTC
29
points
56
comments
12
min read
LW
link
Dreams of Friendliness
Eliezer Yudkowsky
31 Aug 2008 1:20 UTC
31
points
81
comments
9
min read
LW
link
Conceptual issues in AI safety: the paradigmatic gap
vedevazz
24 Jun 2018 15:09 UTC
33
points
0
comments
1
min read
LW
link
(www.foldl.me)
On unfixably unsafe AGI architectures
Steven Byrnes
19 Feb 2020 21:16 UTC
30
points
8
comments
5
min read
LW
link
A toy model of the treacherous turn
Stuart_Armstrong
8 Jan 2016 12:58 UTC
36
points
13
comments
6
min read
LW
link
Allegory On AI Risk, Game Theory, and Mithril
James_Miller
13 Feb 2017 20:41 UTC
45
points
57
comments
3
min read
LW
link
1hr talk: Intro to AGI safety
Steven Byrnes
18 Jun 2019 21:41 UTC
34
points
4
comments
24
min read
LW
link
The Evil AI Overlord List
Stuart_Armstrong
20 Nov 2012 17:02 UTC
44
points
80
comments
1
min read
LW
link
What I would like the SIAI to publish
XiXiDu
1 Nov 2010 14:07 UTC
36
points
225
comments
3
min read
LW
link
Evaluating the feasibility of SI’s plan
JoshuaFox
10 Jan 2013 8:17 UTC
38
points
188
comments
4
min read
LW
link
Q&A with experts on risks from AI #1
XiXiDu
8 Jan 2012 11:46 UTC
45
points
67
comments
9
min read
LW
link
Algo trading is a central example of AI risk
Vanessa Kosoy
28 Jul 2018 20:31 UTC
27
points
5
comments
1
min read
LW
link
Will the world’s elites navigate the creation of AI just fine?
lukeprog
31 May 2013 18:49 UTC
36
points
266
comments
2
min read
LW
link
Let’s talk about “Convergent Rationality”
capybaralet
12 Jun 2019 21:53 UTC
36
points
33
comments
6
min read
LW
link
Breaking Oracles: superrationality and acausal trade
Stuart_Armstrong
25 Nov 2019 10:40 UTC
25
points
15
comments
1
min read
LW
link
Q&A with Stan Franklin on risks from AI
XiXiDu
11 Jun 2011 15:22 UTC
36
points
10
comments
2
min read
LW
link
Muehlhauser-Goertzel Dialogue, Part 1
lukeprog
16 Mar 2012 17:12 UTC
42
points
161
comments
33
min read
LW
link
[LINK] NYT Article about Existential Risk from AI
[deleted]
28 Jan 2013 10:37 UTC
38
points
23
comments
1
min read
LW
link
Reframing the Problem of AI Progress
Wei_Dai
12 Apr 2012 19:31 UTC
32
points
47
comments
1
min read
LW
link
New AI risks research institute at Oxford University
lukeprog
16 Nov 2011 18:52 UTC
36
points
10
comments
1
min read
LW
link
Thoughts on the Feasibility of Prosaic AGI Alignment?
iamthouthouarti
21 Aug 2020 23:25 UTC
8
points
10
comments
1
min read
LW
link
Memes and Rational Decisions
inferential
9 Jan 2015 6:42 UTC
35
points
17
comments
10
min read
LW
link
Levels of AI Self-Improvement
avturchin
29 Apr 2018 11:45 UTC
11
points
0
comments
39
min read
LW
link
Optimising Society to Constrain Risk of War from an Artificial Superintelligence
JohnCDraper
30 Apr 2020 10:47 UTC
3
points
1
comment
51
min read
LW
link
Some Thoughts on Singularity Strategies
Wei_Dai
13 Jul 2011 2:41 UTC
40
points
29
comments
3
min read
LW
link
A trick for Safer GPT-N
Razied
23 Aug 2020 0:39 UTC
7
points
1
comment
2
min read
LW
link
against “AI risk”
Wei_Dai
11 Apr 2012 22:46 UTC
35
points
91
comments
1
min read
LW
link
“Smarter than us” is out!
Stuart_Armstrong
25 Feb 2014 15:50 UTC
41
points
57
comments
1
min read
LW
link
Analysing: Dangerous messages from future UFAI via Oracles
Stuart_Armstrong
22 Nov 2019 14:17 UTC
22
points
16
comments
4
min read
LW
link
Q&A with Abram Demski on risks from AI
XiXiDu
17 Jan 2012 9:43 UTC
33
points
71
comments
9
min read
LW
link
Q&A with experts on risks from AI #2
XiXiDu
9 Jan 2012 19:40 UTC
22
points
29
comments
7
min read
LW
link
AI Safety Discussion Day
Linda Linsefors
15 Sep 2020 14:40 UTC
20
points
0
comments
1
min read
LW
link
A long reply to Ben Garfinkel on Scrutinizing Classic AI Risk Arguments
Søren Elverlin
27 Sep 2020 17:51 UTC
17
points
6
comments
1
min read
LW
link
Online AI Safety Discussion Day
Linda Linsefors
8 Oct 2020 12:11 UTC
5
points
0
comments
1
min read
LW
link
Military AI as a Convergent Goal of Self-Improving AI
avturchin
13 Nov 2017 12:17 UTC
5
points
3
comments
1
min read
LW
link
Neural program synthesis is a dangerous technology
syllogism
12 Jan 2018 16:19 UTC
9
points
6
comments
2
min read
LW
link
New, Brief Popular-Level Introduction to AI Risks and Superintelligence
LyleN
23 Jan 2015 15:43 UTC
33
points
3
comments
1
min read
LW
link
FAI Research Constraints and AGI Side Effects
JustinShovelain
3 Jun 2015 19:25 UTC
26
points
59
comments
7
min read
LW
link
European Master’s Programs in Machine Learning, Artificial Intelligence, and related fields
Master Programs ML/AI
14 Nov 2020 15:51 UTC
32
points
8
comments
1
min read
LW
link
The mind-killer
Paul Crowley
2 May 2009 16:49 UTC
29
points
160
comments
2
min read
LW
link
[Question]
Should I do it?
MrLight
19 Nov 2020 1:08 UTC
−3
points
16
comments
2
min read
LW
link
Rationalising humans: another mugging, but not Pascal’s
Stuart_Armstrong
14 Nov 2017 15:46 UTC
7
points
1
comment
3
min read
LW
link
Machine learning could be fundamentally unexplainable
George
16 Dec 2020 13:32 UTC
26
points
15
comments
15
min read
LW
link
(cerebralab.com)
[Question]
What do you make of AGI:unaligned::spaceships:not enough food?
Ronny
22 Feb 2020 14:14 UTC
4
points
3
comments
1
min read
LW
link
Risk Map of AI Systems
VojtaKovarik
and
Jan_Kulveit
15 Dec 2020 9:16 UTC
25
points
3
comments
8
min read
LW
link
Edge of the Cliff
akaTrickster
5 Jan 2021 17:21 UTC
1
point
0
comments
5
min read
LW
link
[Question]
Does it become easier, or harder, for the world to coordinate around not building AGI as time goes on?
Eli Tyre
29 Jul 2019 22:59 UTC
86
points
31
comments
3
min read
LW
link
2
reviews
Grey Goo Requires AI
harsimony
15 Jan 2021 4:45 UTC
8
points
11
comments
4
min read
LW
link
(harsimony.wordpress.com)
AISU 2021
Linda Linsefors
30 Jan 2021 17:40 UTC
28
points
2
comments
1
min read
LW
link
Nonperson Predicates
Eliezer Yudkowsky
27 Dec 2008 1:47 UTC
49
points
176
comments
6
min read
LW
link
Engaging First Introductions to AI Risk
Rob Bensinger
19 Aug 2013 6:26 UTC
31
points
21
comments
3
min read
LW
link
Formal Solution to the Inner Alignment Problem
michaelcohen
18 Feb 2021 14:51 UTC
46
points
123
comments
2
min read
LW
link
[Question]
What are the biggest current impacts of AI?
Sam Clarke
7 Mar 2021 21:44 UTC
15
points
5
comments
1
min read
LW
link
[Question]
Is a Self-Iterating AGI Vulnerable to Thompson-style Trojans?
sxae
25 Mar 2021 14:46 UTC
15
points
7
comments
3
min read
LW
link
AI oracles on blockchain
Caravaggio
6 Apr 2021 20:13 UTC
5
points
0
comments
3
min read
LW
link
What if AGI is near?
Wulky Wilkinsen
14 Apr 2021 0:05 UTC
11
points
5
comments
1
min read
LW
link
[Question]
Is there anything that can stop AGI development in the near term?
Wulky Wilkinsen
22 Apr 2021 20:37 UTC
5
points
5
comments
1
min read
LW
link
[Question]
[timeboxed exercise] write me your model of AI human-existential safety and the alignment problems in 15 minutes
Quinn
4 May 2021 19:10 UTC
6
points
2
comments
1
min read
LW
link
AI Safety Research Project Ideas
Owain_Evans
21 May 2021 13:39 UTC
58
points
2
comments
3
min read
LW
link
Survey on AI existential risk scenarios
Sam Clarke
,
Alexis Carlier
and
Jonas Schuett
8 Jun 2021 17:12 UTC
60
points
11
comments
7
min read
LW
link
[Question]
What are some claims or opinions about multi-multi delegation you’ve seen in the memeplex that you think deserve scrutiny?
Quinn
27 Jun 2021 17:44 UTC
16
points
6
comments
2
min read
LW
link
Mauhn Releases AI Safety Documentation
Berg Severens
3 Jul 2021 21:23 UTC
4
points
0
comments
1
min read
LW
link
A gentle apocalypse
pchvykov
16 Aug 2021 5:03 UTC
3
points
5
comments
3
min read
LW
link
Could you have stopped Chernobyl?
Carlos Ramirez
27 Aug 2021 1:48 UTC
28
points
17
comments
8
min read
LW
link
The Governance Problem and the “Pretty Good” X-Risk
Zach Stein-Perlman
29 Aug 2021 18:00 UTC
5
points
2
comments
12
min read
LW
link
Distinguishing AI takeover scenarios
Sam Clarke
and
Sammy Martin
8 Sep 2021 16:19 UTC
62
points
11
comments
14
min read
LW
link
The alignment problem in different capability regimes
Buck
9 Sep 2021 19:46 UTC
87
points
12
comments
5
min read
LW
link
How truthful is GPT-3? A benchmark for language models
Owain_Evans
16 Sep 2021 10:09 UTC
54
points
24
comments
6
min read
LW
link
Investigating AI Takeover Scenarios
Sammy Martin
17 Sep 2021 18:47 UTC
27
points
1
comment
27
min read
LW
link
AI takeoff story: a continuation of progress by other means
Edouard Harris
27 Sep 2021 15:55 UTC
74
points
13
comments
10
min read
LW
link
A brief review of the reasons multi-objective RL could be important in AI Safety Research
Ben Smith
29 Sep 2021 17:09 UTC
27
points
7
comments
10
min read
LW
link
The Dark Side of Cognition Hypothesis
Cameron Berg
3 Oct 2021 20:10 UTC
19
points
1
comment
16
min read
LW
link
Truthful AI: Developing and governing AI that does not lie
Owain_Evans
,
owencb
and
Lanrian
18 Oct 2021 18:37 UTC
81
points
9
comments
10
min read
LW
link
AMA on Truthful AI: Owen Cotton-Barratt, Owain Evans & co-authors
Owain_Evans
22 Oct 2021 16:23 UTC
31
points
15
comments
1
min read
LW
link
Truthful and honest AI
abergal
,
Nick_Beckstead
and
Owain_Evans
29 Oct 2021 7:28 UTC
41
points
1
comment
13
min read
LW
link
What is the most evil AI that we could build, today?
ThomasJ
1 Nov 2021 19:58 UTC
−2
points
14
comments
1
min read
LW
link
What are red flags for Neural Network suffering?
Marius Hobbhahn
8 Nov 2021 12:51 UTC
26
points
15
comments
12
min read
LW
link
Hardcode the AGI to need our approval indefinitely?
MichaelStJules
11 Nov 2021 7:04 UTC
2
points
2
comments
1
min read
LW
link
What would we do if alignment were futile?
Grant Demaree
14 Nov 2021 8:09 UTC
73
points
43
comments
3
min read
LW
link
Two Stupid AI Alignment Ideas
aphyer
16 Nov 2021 16:13 UTC
24
points
3
comments
4
min read
LW
link
Super intelligent AIs that don’t require alignment
Yair Halberstadt
16 Nov 2021 19:55 UTC
10
points
2
comments
6
min read
LW
link
AI Tracker: monitoring current and near-future risks from superscale models
Edouard Harris
and
Jeremie Harris
23 Nov 2021 19:16 UTC
62
points
13
comments
3
min read
LW
link
(aitracker.org)
HIRING: Inform and shape a new project on AI safety at Partnership on AI
Madhulika Srikumar
24 Nov 2021 8:27 UTC
6
points
0
comments
1
min read
LW
link
How to measure FLOP/s for Neural Networks empirically?
Marius Hobbhahn
29 Nov 2021 15:18 UTC
15
points
3
comments
7
min read
LW
link
Modeling Failure Modes of High-Level Machine Intelligence
Ben Cottier
,
Daniel_Eth
and
Sammy Martin
6 Dec 2021 13:54 UTC
54
points
1
comment
12
min read
LW
link
HIRING: Inform and shape a new project on AI safety at Partnership on AI
madhu_lika
7 Dec 2021 19:37 UTC
1
point
0
comments
1
min read
LW
link
Universality and the “Filter”
maggiehayes
16 Dec 2021 0:47 UTC
10
points
3
comments
11
min read
LW
link
Reviews of “Is power-seeking AI an existential risk?”
Joe Carlsmith
16 Dec 2021 20:48 UTC
67
points
20
comments
1
min read
LW
link
2+2: Ontological Framework
Lyrialtus
1 Feb 2022 1:07 UTC
−15
points
2
comments
15
min read
LW
link
Can the laws of physics/nature prevent hell?
superads91
6 Feb 2022 20:39 UTC
−7
points
10
comments
2
min read
LW
link
How harmful are improvements in AI? + Poll
tilker
and
Marius Hobbhahn
15 Feb 2022 18:16 UTC
15
points
4
comments
8
min read
LW
link
Preserving and continuing alignment research through a severe global catastrophe
A_donor
6 Mar 2022 18:43 UTC
36
points
11
comments
5
min read
LW
link
Ask AI companies about what they are doing for AI safety?
Michael Chen
9 Mar 2022 15:14 UTC
50
points
0
comments
2
min read
LW
link
Is There a Valley of Bad Civilizational Adequacy?
lbThingrb
11 Mar 2022 19:49 UTC
13
points
1
comment
2
min read
LW
link
[Question]
Danger(s) of theorem-proving AI?
Yitz
16 Mar 2022 2:47 UTC
7
points
9
comments
1
min read
LW
link
It Looks Like You’re Trying To Take Over The World
gwern
9 Mar 2022 16:35 UTC
376
points
124
comments
1
min read
LW
link
(www.gwern.net)
We Are Conjecture, A New Alignment Research Startup
Connor Leahy
8 Apr 2022 11:40 UTC
170
points
24
comments
4
min read
LW
link
Clippy’s modest proposal
Daphne_W
11 Apr 2022 21:00 UTC
5
points
13
comments
9
min read
LW
link
Is technical AI alignment research a net positive?
cranberry_bear
12 Apr 2022 13:07 UTC
4
points
2
comments
2
min read
LW
link
The Peerless
carado
13 Apr 2022 1:07 UTC
3
points
2
comments
1
min read
LW
link
(carado.moe)
[Question]
Can someone explain to me why MIRI is so pessimistic of our chances of survival?
iamthouthouarti
14 Apr 2022 20:28 UTC
10
points
7
comments
1
min read
LW
link
[Question]
Convince me that humanity *isn’t* doomed by AGI
Yitz
15 Apr 2022 17:26 UTC
59
points
53
comments
1
min read
LW
link
Reflections on My Own Missing Mood
Conor Sullivan
21 Apr 2022 16:19 UTC
51
points
25
comments
5
min read
LW
link
Eight Short Studies On Excuses
Scott Alexander
20 Apr 2010 23:01 UTC
645
points
244
comments
10
min read
LW
link
Code Generation as an AI risk setting
Not Relevant
17 Apr 2022 22:27 UTC
91
points
16
comments
2
min read
LW
link
[Question]
What is being improved in recursive self improvement?
Conor Sullivan
25 Apr 2022 18:30 UTC
6
points
7
comments
1
min read
LW
link
AI Alternative Futures: Scenario Mapping Artificial Intelligence Risk—Request for Participation (*Edit*)
Kakili
27 Apr 2022 22:07 UTC
10
points
2
comments
9
min read
LW
link
Video and Transcript of Presentation on Existential Risk from Power-Seeking AI
Joe Carlsmith
8 May 2022 3:50 UTC
20
points
1
comment
29
min read
LW
link
Interpretability’s Alignment-Solving Potential: Analysis of 7 Scenarios
Evan R. Murphy
12 May 2022 20:01 UTC
41
points
0
comments
59
min read
LW
link
Agency As a Natural Abstraction
Thane Ruthenis
13 May 2022 18:02 UTC
41
points
8
comments
13
min read
LW
link
[Link post] Promising Paths to Alignment—Connor Leahy | Talk
frances_lorenz
14 May 2022 16:01 UTC
33
points
0
comments
1
min read
LW
link
DeepMind’s generalist AI, Gato: A non-technical explainer
frances_lorenz
,
Nora Belrose
and
jonmenaster
16 May 2022 21:21 UTC
57
points
6
comments
6
min read
LW
link
Actionable-guidance and roadmap recommendations for the NIST AI Risk Management Framework
Dan Hendrycks
and
Tony Barrett
17 May 2022 15:26 UTC
24
points
0
comments
3
min read
LW
link
In defence of flailing
acylhalide
18 Jun 2022 5:26 UTC
10
points
13
comments
4
min read
LW
link
Why I’m Optimistic About Near-Term AI Risk
harsimony
15 May 2022 23:05 UTC
47
points
28
comments
1
min read
LW
link
Reshaping the AI Industry
Thane Ruthenis
29 May 2022 22:54 UTC
133
points
34
comments
21
min read
LW
link
Explaining inner alignment to myself
Jeremy Gillen
24 May 2022 23:10 UTC
6
points
2
comments
10
min read
LW
link
Right now, you’re sitting on a REDONKULOUS opportunity to help solve AGI (and rake in $$$)
Trevor1
26 May 2022 21:55 UTC
41
points
12
comments
2
min read
LW
link
A Story of AI Risk: InstructGPT-N
peterbarnett
26 May 2022 23:22 UTC
21
points
0
comments
8
min read
LW
link
We will be around in 30 years
mukashi
7 Jun 2022 3:47 UTC
13
points
205
comments
2
min read
LW
link
Transformer Research Questions from Stained Glass Windows
StefanHex
8 Jun 2022 12:38 UTC
4
points
0
comments
2
min read
LW
link
Towards Gears-Level Understanding of Agency
Thane Ruthenis
16 Jun 2022 22:00 UTC
7
points
4
comments
18
min read
LW
link
A plausible story about AI risk.
DeLesley Hutchins
10 Jun 2022 2:08 UTC
13
points
1
comment
4
min read
LW
link
Summary of “AGI Ruin: A List of Lethalities”
Stephen McAleese
10 Jun 2022 22:35 UTC
32
points
2
comments
8
min read
LW
link
Poorly-Aimed Death Rays
Thane Ruthenis
11 Jun 2022 18:29 UTC
39
points
5
comments
4
min read
LW
link
Contra EY: Can AGI destroy us without trial & error?
Nikita Sokolsky
13 Jun 2022 18:26 UTC
123
points
71
comments
15
min read
LW
link
A Modest Pivotal Act
anonymousaisafety
13 Jun 2022 19:24 UTC
−17
points
0
comments
5
min read
LW
link
Alignment research for “meta” purposes
acylhalide
16 Jun 2022 14:03 UTC
12
points
0
comments
1
min read
LW
link
Specific problems with specific animal comparisons for AI policy
Trevor1
19 Jun 2022 1:27 UTC
3
points
1
comment
2
min read
LW
link
[Question]
AI misalignment risk from GPT-like systems?
fiso
19 Jun 2022 17:35 UTC
10
points
8
comments
1
min read
LW
link
Causal confusion as an argument against the scaling hypothesis
RobertKirk
and
David Krueger
20 Jun 2022 10:54 UTC
78
points
24
comments
18
min read
LW
link
[Question]
Is the study of AI an infohazard?
blackstampede
20 Jun 2022 14:25 UTC
6
points
15
comments
1
min read
LW
link
[LQ] Some Thoughts on Messaging Around AI Risk
𝕮𝖎𝖓𝖊𝖗𝖆
25 Jun 2022 13:53 UTC
9
points
3
comments
6
min read
LW
link
No comments.
Back to top