Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
and
elifland
Aug 29, 2022, 1:23 AM
413
points
90
comments
37
min read
LW
link
1
review
DeepMind alignment team opinions on AGI ruin arguments
Vika
Aug 12, 2022, 9:06 PM
395
points
37
comments
14
min read
LW
link
1
review
A Mechanistic Interpretability Analysis of Grokking
Neel Nanda
and
Tom Lieberum
Aug 15, 2022, 2:41 AM
373
points
48
comments
36
min read
LW
link
1
review
(colab.research.google.com)
Two-year update on my personal AI timelines
Ajeya Cotra
Aug 2, 2022, 11:07 PM
293
points
60
comments
16
min read
LW
link
Common misconceptions about OpenAI
Jacob_Hilton
Aug 25, 2022, 2:02 PM
238
points
154
comments
5
min read
LW
link
1
review
What do ML researchers think about AI in 2022?
KatjaGrace
Aug 4, 2022, 3:40 PM
221
points
33
comments
3
min read
LW
link
(aiimpacts.org)
How To Go From Interpretability To Alignment: Just Retarget The Search
johnswentworth
Aug 10, 2022, 4:08 PM
210
points
34
comments
3
min read
LW
link
1
review
Worlds Where Iterative Design Fails
johnswentworth
Aug 30, 2022, 8:48 PM
209
points
30
comments
10
min read
LW
link
1
review
Language models seem to be much better than humans at next-token prediction
Buck
,
Fabien Roger
and
LawrenceC
Aug 11, 2022, 5:45 PM
182
points
60
comments
13
min read
LW
link
1
review
Some conceptual alignment research projects
Richard_Ngo
Aug 25, 2022, 10:51 PM
177
points
15
comments
3
min read
LW
link
Shard Theory: An Overview
David Udell
Aug 11, 2022, 5:44 AM
166
points
34
comments
10
min read
LW
link
Nate Soares’ Life Advice
CatGoddess
Aug 23, 2022, 2:46 AM
157
points
41
comments
3
min read
LW
link
Your posts should be on arXiv
JanB
Aug 25, 2022, 10:35 AM
156
points
44
comments
3
min read
LW
link
What’s General-Purpose Search, And Why Might We Expect To See It In Trained ML Systems?
johnswentworth
Aug 15, 2022, 10:48 PM
156
points
18
comments
10
min read
LW
link
Interpretability/Tool-ness/Alignment/Corrigibility are not Composable
johnswentworth
Aug 8, 2022, 6:05 PM
144
points
13
comments
3
min read
LW
link
The Parable of the Boy Who Cried 5% Chance of Wolf
KatWoods
Aug 15, 2022, 2:33 PM
140
points
24
comments
2
min read
LW
link
How might we align transformative AI if it’s developed very soon?
HoldenKarnofsky
Aug 29, 2022, 3:42 PM
140
points
55
comments
45
min read
LW
link
1
review
Externalized reasoning oversight: a research direction for language model alignment
tamera
Aug 3, 2022, 12:03 PM
136
points
23
comments
6
min read
LW
link
Fiber arts, mysterious dodecahedrons, and waiting on “Eureka!”
eukaryote
Aug 4, 2022, 8:37 PM
124
points
15
comments
9
min read
LW
link
1
review
(eukaryotewritesblog.com)
Taking the parameters which seem to matter and rotating them until they don’t
Garrett Baker
Aug 26, 2022, 6:26 PM
120
points
48
comments
1
min read
LW
link
Meditation course claims 65% enlightenment rate: my review
KatWoods
Aug 1, 2022, 11:25 AM
111
points
35
comments
14
min read
LW
link
Oversight Misses 100% of Thoughts The AI Does Not Think
johnswentworth
Aug 12, 2022, 4:30 PM
111
points
49
comments
1
min read
LW
link
The lessons of Xanadu
jasoncrawford
Aug 7, 2022, 5:59 PM
110
points
20
comments
8
min read
LW
link
(jasoncrawford.org)
The alignment problem from a deep learning perspective
Richard_Ngo
Aug 10, 2022, 10:46 PM
107
points
15
comments
27
min read
LW
link
1
review
Beliefs and Disagreements about Automating Alignment Research
Ian McKenzie
Aug 24, 2022, 6:37 PM
107
points
4
comments
7
min read
LW
link
How likely is deceptive alignment?
evhub
Aug 30, 2022, 7:34 PM
105
points
28
comments
60
min read
LW
link
Rant on Problem Factorization for Alignment
johnswentworth
Aug 5, 2022, 7:23 PM
104
points
53
comments
6
min read
LW
link
Announcing Encultured AI: Building a Video Game
Andrew_Critch
and
Nick Hay
Aug 18, 2022, 2:16 AM
103
points
26
comments
4
min read
LW
link
Introducing Pastcasting: A tool for forecasting practice
Sage Future
Aug 11, 2022, 5:38 PM
95
points
10
comments
2
min read
LW
link
2
reviews
Survey advice
KatjaGrace
Aug 24, 2022, 3:10 AM
93
points
11
comments
3
min read
LW
link
(worldspiritsockpuppet.com)
How to do theoretical research, a personal perspective
Mark Xu
Aug 19, 2022, 7:41 PM
91
points
6
comments
15
min read
LW
link
Less Threat-Dependent Bargaining Solutions?? (3/2)
Diffractor
Aug 20, 2022, 2:19 AM
88
points
7
comments
6
min read
LW
link
[Question]
Seriously, what goes wrong with “reward the agent when it makes you smile”?
TurnTrout
Aug 11, 2022, 10:22 PM
87
points
43
comments
2
min read
LW
link
Refining the Sharp Left Turn threat model, part 1: claims and mechanisms
Vika
,
Vikrant Varma
,
Ramana Kumar
and
Mary Phuong
Aug 12, 2022, 3:17 PM
86
points
4
comments
3
min read
LW
link
1
review
(vkrakovna.wordpress.com)
High Reliability Orgs, and AI Companies
Raemon
Aug 4, 2022, 5:45 AM
86
points
7
comments
12
min read
LW
link
1
review
I’m mildly skeptical that blindness prevents schizophrenia
Steven Byrnes
Aug 15, 2022, 11:36 PM
83
points
9
comments
4
min read
LW
link
Human Mimicry Mainly Works When We’re Already Close
johnswentworth
Aug 17, 2022, 6:41 PM
82
points
16
comments
5
min read
LW
link
Most Ivy-smart students aren’t at Ivy-tier schools
Aaron Bergman
Aug 7, 2022, 3:18 AM
82
points
7
comments
8
min read
LW
link
(www.aaronbergman.net)
«Boundaries», Part 2: trends in EA’s handling of boundaries
Andrew_Critch
Aug 6, 2022, 12:42 AM
81
points
15
comments
7
min read
LW
link
Paper is published! 100,000 lumens to treat seasonal affective disorder
Fabienne
Aug 20, 2022, 7:48 PM
81
points
3
comments
1
min read
LW
link
(www.lesswrong.com)
Evolution is a bad analogy for AGI: inner alignment
Quintin Pope
Aug 13, 2022, 10:15 PM
81
points
15
comments
8
min read
LW
link
What’s the Least Impressive Thing GPT-4 Won’t be Able to Do
Algon
Aug 20, 2022, 7:48 PM
80
points
125
comments
1
min read
LW
link
The Loire Is Not Dry
jefftk
Aug 20, 2022, 1:40 PM
80
points
2
comments
1
min read
LW
link
(www.jefftk.com)
AI strategy nearcasting
HoldenKarnofsky
Aug 25, 2022, 5:26 PM
79
points
4
comments
9
min read
LW
link
How (not) to choose a research project
Garrett Baker
,
CatGoddess
and
Johannes C. Mayer
Aug 9, 2022, 12:26 AM
79
points
11
comments
7
min read
LW
link
The Core of the Alignment Problem is...
Thomas Larsen
,
Jeremy Gillen
and
JamesH
Aug 17, 2022, 8:07 PM
76
points
10
comments
9
min read
LW
link
Discovering Agents
zac_kenton
Aug 18, 2022, 5:33 PM
73
points
11
comments
6
min read
LW
link
Announcing the Introduction to ML Safety course
Dan H
,
TW123
and
ozhang
6 Aug 2022 2:46 UTC
73
points
6
comments
7
min read
LW
link
[Question]
COVID-19 Group Testing Post-mortem?
gwern
5 Aug 2022 16:32 UTC
72
points
6
comments
2
min read
LW
link
The longest training run
Jsevillamol
,
Tamay
,
Owen D
and
anson.ho
17 Aug 2022 17:18 UTC
71
points
12
comments
9
min read
LW
link
(epochai.org)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel