Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Page
1
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
5 Jun 2022 22:05 UTC
957
points
711
comments
30
min read
LW
link
3
reviews
Where I agree and disagree with Eliezer
paulfchristiano
19 Jun 2022 19:15 UTC
911
points
224
comments
18
min read
LW
link
2
reviews
What an actually pessimistic containment strategy looks like
lc
5 Apr 2022 0:19 UTC
685
points
138
comments
6
min read
LW
link
2
reviews
Simulators
janus
2 Sep 2022 12:45 UTC
673
points
168
comments
41
min read
LW
link
8
reviews
(generative.ink)
Let’s think about slowing down AI
KatjaGrace
22 Dec 2022 17:40 UTC
556
points
182
comments
38
min read
LW
link
3
reviews
(aiimpacts.org)
The Redaction Machine
Ben
20 Sep 2022 22:03 UTC
522
points
48
comments
27
min read
LW
link
1
review
Losing the root for the tree
Adam Zerner
20 Sep 2022 4:53 UTC
501
points
31
comments
9
min read
LW
link
1
review
Luck based medicine: my resentful story of becoming a medical miracle
Elizabeth
16 Oct 2022 17:40 UTC
495
points
121
comments
12
min read
LW
link
3
reviews
(acesounderglass.com)
Counter-theses on Sleep
Natália
21 Mar 2022 23:21 UTC
453
points
135
comments
15
min read
LW
link
1
review
It’s Probably Not Lithium
Natália
28 Jun 2022 21:24 UTC
447
points
186
comments
28
min read
LW
link
1
review
You Are Not Measuring What You Think You Are Measuring
johnswentworth
20 Sep 2022 20:04 UTC
430
points
45
comments
8
min read
LW
link
2
reviews
chinchilla’s wild implications
nostalgebraist
31 Jul 2022 1:18 UTC
425
points
128
comments
10
min read
LW
link
1
review
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
and
elifland
29 Aug 2022 1:23 UTC
413
points
90
comments
37
min read
LW
link
1
review
It Looks Like You’re Trying To Take Over The World
gwern
9 Mar 2022 16:35 UTC
413
points
120
comments
1
min read
LW
link
1
review
(www.gwern.net)
DeepMind alignment team opinions on AGI ruin arguments
Vika
12 Aug 2022 21:06 UTC
397
points
37
comments
14
min read
LW
link
1
review
Lies Told To Children
Eliezer Yudkowsky
14 Apr 2022 11:25 UTC
393
points
97
comments
7
min read
LW
link
1
review
Reflections on six months of fatherhood
jasoncrawford
31 Jan 2022 5:28 UTC
391
points
24
comments
4
min read
LW
link
1
review
(jasoncrawford.org)
Reward is not the optimization target
TurnTrout
25 Jul 2022 0:03 UTC
384
points
128
comments
10
min read
LW
link
3
reviews
MIRI announces new “Death With Dignity” strategy
Eliezer Yudkowsky
2 Apr 2022 0:43 UTC
383
points
547
comments
18
min read
LW
link
1
review
Counterarguments to the basic AI x-risk case
KatjaGrace
14 Oct 2022 13:00 UTC
375
points
125
comments
34
min read
LW
link
1
review
(aiimpacts.org)
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
and
Tom Lieberum
15 Aug 2022 2:41 UTC
374
points
48
comments
36
min read
LW
link
1
review
(colab.research.google.com)
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
18 Jul 2022 19:06 UTC
372
points
95
comments
75
min read
LW
link
1
review
Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment
elspood
21 Jun 2022 23:55 UTC
369
points
42
comments
7
min read
LW
link
1
review
Staring into the abyss as a core life skill
benkuhn
22 Dec 2022 15:30 UTC
368
points
24
comments
12
min read
LW
link
1
review
(www.benkuhn.net)
Accounting For College Costs
johnswentworth
1 Apr 2022 17:28 UTC
368
points
41
comments
7
min read
LW
link
What DALL-E 2 can and cannot do
Swimmer963 (Miranda Dixon-Luinenburg)
1 May 2022 23:51 UTC
353
points
304
comments
9
min read
LW
link
Optimality is the tiger, and agents are its teeth
Veedrac
2 Apr 2022 0:46 UTC
350
points
46
comments
16
min read
LW
link
1
review
What should you change in response to an “emergency”? And AI risk
AnnaSalamon
18 Jul 2022 1:11 UTC
346
points
60
comments
7
min read
LW
link
1
review
Beware boasting about non-existent forecasting track records
Jotto999
20 May 2022 19:20 UTC
340
points
112
comments
5
min read
LW
link
1
review
Why I think strong general AI is coming soon
porby
28 Sep 2022 5:40 UTC
340
points
141
comments
34
min read
LW
link
1
review
Looking back on my alignment PhD
TurnTrout
1 Jul 2022 3:19 UTC
334
points
67
comments
11
min read
LW
link
Models Don’t “Get Reward”
Sam Ringer
30 Dec 2022 10:37 UTC
328
points
64
comments
5
min read
LW
link
1
review
Epistemic Legibility
Elizabeth
9 Feb 2022 18:10 UTC
319
points
30
comments
20
min read
LW
link
1
review
(acesounderglass.com)
Six Dimensions of Operational Adequacy in AGI Projects
Eliezer Yudkowsky
30 May 2022 17:00 UTC
317
points
66
comments
13
min read
LW
link
1
review
On how various plans miss the hard bits of the alignment challenge
So8res
12 Jul 2022 2:49 UTC
316
points
89
comments
29
min read
LW
link
3
reviews
Why Agent Foundations? An Overly Abstract Explanation
johnswentworth
25 Mar 2022 23:17 UTC
313
points
60
comments
8
min read
LW
link
1
review
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
and
Eliezer Yudkowsky
1 Dec 2022 23:11 UTC
303
points
33
comments
2
min read
LW
link
Don’t die with dignity; instead play to your outs
Jeffrey Ladish
6 Apr 2022 7:53 UTC
302
points
60
comments
5
min read
LW
link
Sazen
Duncan Sabien (Inactive)
21 Dec 2022 7:54 UTC
295
points
85
comments
12
min read
LW
link
2
reviews
What Are You Tracking In Your Head?
johnswentworth
28 Jun 2022 19:30 UTC
294
points
84
comments
4
min read
LW
link
1
review
Two-year update on my personal AI timelines
Ajeya Cotra
2 Aug 2022 23:07 UTC
293
points
60
comments
16
min read
LW
link
Toni Kurz and the Insanity of Climbing Mountains
GeneSmith
3 Jul 2022 20:51 UTC
289
points
73
comments
11
min read
LW
link
2
reviews
Mysteries of mode collapse
janus
8 Nov 2022 10:37 UTC
288
points
57
comments
14
min read
LW
link
1
review
We Choose To Align AI
johnswentworth
1 Jan 2022 20:06 UTC
283
points
17
comments
3
min read
LW
link
1
review
A central AI alignment problem: capabilities generalization, and the sharp left turn
So8res
15 Jun 2022 13:10 UTC
278
points
56
comments
10
min read
LW
link
1
review
Is AI Progress Impossible To Predict?
alyssavance
15 May 2022 18:30 UTC
278
points
39
comments
2
min read
LW
link
Comment reply: my low-quality thoughts on why CFAR didn’t get farther with a “real/efficacious art of rationality”
AnnaSalamon
9 Jun 2022 2:12 UTC
274
points
80
comments
17
min read
LW
link
1
review
12 interesting things I learned studying the discovery of nature’s laws
Ben Pace
19 Feb 2022 23:39 UTC
271
points
40
comments
9
min read
LW
link
1
review
Humans are very reliable agents
alyssavance
16 Jun 2022 22:02 UTC
270
points
35
comments
3
min read
LW
link
Changing the world through slack & hobbies
Steven Byrnes
21 Jul 2022 18:11 UTC
266
points
13
comments
10
min read
LW
link
Back to top
Next