Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Page
1
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
5 Jun 2022 22:05 UTC
981
points
715
comments
30
min read
LW
link
3
reviews
Where I agree and disagree with Eliezer
paulfchristiano
19 Jun 2022 19:15 UTC
933
points
224
comments
18
min read
LW
link
2
reviews
Simulators
janus
2 Sep 2022 12:45 UTC
713
points
170
comments
41
min read
LW
link
8
reviews
(generative.ink)
What an actually pessimistic containment strategy looks like
lc
5 Apr 2022 0:19 UTC
687
points
138
comments
6
min read
LW
link
2
reviews
Let’s think about slowing down AI
KatjaGrace
22 Dec 2022 17:40 UTC
562
points
182
comments
38
min read
LW
link
3
reviews
(aiimpacts.org)
The Redaction Machine
Ben
20 Sep 2022 22:03 UTC
530
points
48
comments
27
min read
LW
link
1
review
Losing the root for the tree
Adam Zerner
20 Sep 2022 4:53 UTC
511
points
31
comments
9
min read
LW
link
1
review
Luck based medicine: my resentful story of becoming a medical miracle
Elizabeth
16 Oct 2022 17:40 UTC
497
points
121
comments
12
min read
LW
link
3
reviews
(acesounderglass.com)
Counter-theses on Sleep
Natália
21 Mar 2022 23:21 UTC
455
points
135
comments
15
min read
LW
link
1
review
It’s Probably Not Lithium
Natália
28 Jun 2022 21:24 UTC
447
points
186
comments
28
min read
LW
link
1
review
You Are Not Measuring What You Think You Are Measuring
johnswentworth
20 Sep 2022 20:04 UTC
442
points
45
comments
8
min read
LW
link
2
reviews
chinchilla’s wild implications
nostalgebraist
31 Jul 2022 1:18 UTC
425
points
129
comments
10
min read
LW
link
1
review
It Looks Like You’re Trying To Take Over The World
gwern
9 Mar 2022 16:35 UTC
418
points
120
comments
1
min read
LW
link
1
review
(www.gwern.net)
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
and
elifland
29 Aug 2022 1:23 UTC
415
points
90
comments
37
min read
LW
link
1
review
Lies Told To Children
Eliezer Yudkowsky
14 Apr 2022 11:25 UTC
401
points
99
comments
7
min read
LW
link
1
review
MIRI announces new “Death With Dignity” strategy
Eliezer Yudkowsky
2 Apr 2022 0:43 UTC
398
points
546
comments
18
min read
LW
link
1
review
DeepMind alignment team opinions on AGI ruin arguments
Vika
12 Aug 2022 21:06 UTC
397
points
37
comments
14
min read
LW
link
1
review
Reflections on six months of fatherhood
jasoncrawford
31 Jan 2022 5:28 UTC
395
points
24
comments
4
min read
LW
link
1
review
(jasoncrawford.org)
Reward is not the optimization target
TurnTrout
25 Jul 2022 0:03 UTC
386
points
128
comments
10
min read
LW
link
3
reviews
Staring into the abyss as a core life skill
benkuhn
22 Dec 2022 15:30 UTC
379
points
24
comments
12
min read
LW
link
1
review
(www.benkuhn.net)
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
and
Tom Lieberum
15 Aug 2022 2:41 UTC
377
points
48
comments
36
min read
LW
link
1
review
(colab.research.google.com)
Counterarguments to the basic AI x-risk case
KatjaGrace
14 Oct 2022 13:00 UTC
376
points
126
comments
34
min read
LW
link
1
review
(aiimpacts.org)
Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover
Ajeya Cotra
18 Jul 2022 19:06 UTC
375
points
95
comments
75
min read
LW
link
1
review
Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment
elspood
21 Jun 2022 23:55 UTC
370
points
42
comments
7
min read
LW
link
1
review
Accounting For College Costs
johnswentworth
1 Apr 2022 17:28 UTC
368
points
41
comments
7
min read
LW
link
What DALL-E 2 can and cannot do
Swimmer963 (Miranda Dixon-Luinenburg)
1 May 2022 23:51 UTC
353
points
305
comments
9
min read
LW
link
Optimality is the tiger, and agents are its teeth
Veedrac
2 Apr 2022 0:46 UTC
352
points
46
comments
16
min read
LW
link
1
review
Models Don’t “Get Reward”
Sam Ringer
30 Dec 2022 10:37 UTC
348
points
64
comments
5
min read
LW
link
1
review
What should you change in response to an “emergency”? And AI risk
AnnaSalamon
18 Jul 2022 1:11 UTC
346
points
60
comments
7
min read
LW
link
1
review
Why I think strong general AI is coming soon
porby
28 Sep 2022 5:40 UTC
344
points
141
comments
34
min read
LW
link
1
review
Beware boasting about non-existent forecasting track records
Jotto999
20 May 2022 19:20 UTC
339
points
112
comments
5
min read
LW
link
1
review
Looking back on my alignment PhD
TurnTrout
1 Jul 2022 3:19 UTC
334
points
67
comments
11
min read
LW
link
Six Dimensions of Operational Adequacy in AGI Projects
Eliezer Yudkowsky
30 May 2022 17:00 UTC
325
points
66
comments
13
min read
LW
link
1
review
On how various plans miss the hard bits of the alignment challenge
So8res
12 Jul 2022 2:49 UTC
322
points
91
comments
29
min read
LW
link
3
reviews
Epistemic Legibility
Elizabeth
9 Feb 2022 18:10 UTC
320
points
30
comments
20
min read
LW
link
1
review
(acesounderglass.com)
Why Agent Foundations? An Overly Abstract Explanation
johnswentworth
25 Mar 2022 23:17 UTC
319
points
60
comments
8
min read
LW
link
1
review
Sazen
Duncan Sabien (Inactive)
21 Dec 2022 7:54 UTC
305
points
87
comments
12
min read
LW
link
2
reviews
A challenge for AGI organizations, and a challenge for readers
Rob Bensinger
and
Eliezer Yudkowsky
1 Dec 2022 23:11 UTC
304
points
33
comments
2
min read
LW
link
Don’t die with dignity; instead play to your outs
Jeffrey Ladish
6 Apr 2022 7:53 UTC
303
points
59
comments
5
min read
LW
link
Mysteries of mode collapse
janus
8 Nov 2022 10:37 UTC
303
points
57
comments
14
min read
LW
link
1
review
What Are You Tracking In Your Head?
johnswentworth
28 Jun 2022 19:30 UTC
303
points
84
comments
4
min read
LW
link
1
review
Toni Kurz and the Insanity of Climbing Mountains
GeneSmith
3 Jul 2022 20:51 UTC
295
points
73
comments
11
min read
LW
link
2
reviews
Two-year update on my personal AI timelines
Ajeya Cotra
2 Aug 2022 23:07 UTC
293
points
60
comments
16
min read
LW
link
We Choose To Align AI
johnswentworth
1 Jan 2022 20:06 UTC
283
points
17
comments
3
min read
LW
link
1
review
A central AI alignment problem: capabilities generalization, and the sharp left turn
So8res
15 Jun 2022 13:10 UTC
281
points
56
comments
10
min read
LW
link
1
review
Is AI Progress Impossible To Predict?
alyssavance
15 May 2022 18:30 UTC
278
points
39
comments
2
min read
LW
link
12 interesting things I learned studying the discovery of nature’s laws
Ben Pace
19 Feb 2022 23:39 UTC
276
points
41
comments
9
min read
LW
link
1
review
Comment reply: my low-quality thoughts on why CFAR didn’t get farther with a “real/efficacious art of rationality”
AnnaSalamon
9 Jun 2022 2:12 UTC
275
points
81
comments
17
min read
LW
link
1
review
Changing the world through slack & hobbies
Steven Byrnes
21 Jul 2022 18:11 UTC
272
points
13
comments
10
min read
LW
link
Humans are very reliable agents
alyssavance
16 Jun 2022 22:02 UTC
271
points
35
comments
3
min read
LW
link
Back to top
Next