Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Page
4
Deep Honesty
Aletheophile
May 7, 2024, 8:31 PM
159
points
25
comments
9
min read
LW
link
Making every researcher seek grants is a broken model
jasoncrawford
Jan 26, 2024, 4:06 PM
159
points
41
comments
4
min read
LW
link
(rootsofprogress.org)
Current safety training techniques do not fully transfer to the agent setting
Simon Lermen
and
Govind Pimpale
Nov 3, 2024, 7:24 PM
158
points
9
comments
5
min read
LW
link
What’s up with LLMs representing XORs of arbitrary features?
Sam Marks
Jan 3, 2024, 7:44 PM
158
points
63
comments
16
min read
LW
link
Language Models Model Us
eggsyntax
May 17, 2024, 9:00 PM
158
points
55
comments
7
min read
LW
link
EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
scasper
May 21, 2024, 8:15 PM
157
points
16
comments
3
min read
LW
link
Ironing Out the Squiggles
Zack_M_Davis
Apr 29, 2024, 4:13 PM
157
points
36
comments
11
min read
LW
link
[Question]
things that confuse me about the current AI market.
DMMF
Aug 28, 2024, 1:46 PM
156
points
27
comments
2
min read
LW
link
Apologizing is a Core Rationalist Skill
johnswentworth
Jan 2, 2024, 5:47 PM
156
points
42
comments
5
min read
LW
link
If you weren’t such an idiot...
kave
and
Mark Xu
Mar 2, 2024, 12:01 AM
156
points
74
comments
2
min read
LW
link
(markxu.com)
The Incredible Fentanyl-Detecting Machine
sarahconstantin
Jun 28, 2024, 10:10 PM
156
points
26
comments
7
min read
LW
link
(sarahconstantin.substack.com)
Formal verification, heuristic explanations and surprise accounting
Jacob_Hilton
Jun 25, 2024, 3:40 PM
156
points
11
comments
9
min read
LW
link
(www.alignment.org)
A Rocket–Interpretability Analogy
plex
Oct 21, 2024, 1:55 PM
155
points
31
comments
1
min read
LW
link
“It’s a 10% chance which I did 10 times, so it should be 100%”
egor.timatkov
Nov 18, 2024, 1:14 AM
154
points
59
comments
2
min read
LW
link
o3
Zach Stein-Perlman
Dec 20, 2024, 6:30 PM
154
points
164
comments
1
min read
LW
link
Dyslucksia
Shoshannah Tekofsky
May 9, 2024, 7:21 PM
154
points
45
comments
6
min read
LW
link
Subskills of “Listening to Wisdom”
Raemon
Dec 9, 2024, 3:01 AM
154
points
29
comments
42
min read
LW
link
Liability regimes for AI
Ege Erdil
Aug 19, 2024, 1:25 AM
153
points
34
comments
5
min read
LW
link
Deep atheism and AI risk
Joe Carlsmith
Jan 4, 2024, 6:58 PM
153
points
22
comments
27
min read
LW
link
Decomposing Agency — capabilities without desires
owencb
and
Raymond D
Jul 11, 2024, 9:38 AM
153
points
32
comments
12
min read
LW
link
(strangecities.substack.com)
OpenAI: Exodus
Zvi
May 20, 2024, 1:10 PM
153
points
26
comments
44
min read
LW
link
(thezvi.wordpress.com)
Arithmetic is an underrated world-modeling technology
dynomight
Oct 17, 2024, 2:00 PM
152
points
33
comments
6
min read
LW
link
(dynomight.net)
“Alignment Faking” frame is somewhat fake
Jan_Kulveit
Dec 20, 2024, 9:51 AM
152
points
13
comments
6
min read
LW
link
Using axis lines for good or evil
dynomight
Mar 6, 2024, 2:47 PM
151
points
39
comments
4
min read
LW
link
(dynomight.net)
Priors and Prejudice
MathiasKB
Apr 22, 2024, 3:00 PM
151
points
31
comments
7
min read
LW
link
The Checklist: What Succeeding at AI Safety Will Involve
Sam Bowman
Sep 3, 2024, 6:18 PM
151
points
49
comments
22
min read
LW
link
(sleepinyourhat.github.io)
My takes on SB-1047
leogao
Sep 9, 2024, 6:38 PM
151
points
8
comments
4
min read
LW
link
Daniel Dennett has died (1942-2024)
kave
Apr 19, 2024, 4:17 PM
150
points
5
comments
1
min read
LW
link
(dailynous.com)
2023 Survey Results
Screwtape
Feb 16, 2024, 10:24 PM
150
points
26
comments
44
min read
LW
link
Vernor Vinge, who coined the term “Technological Singularity”, dies at 79
Kaj_Sotala
Mar 21, 2024, 10:14 PM
149
points
25
comments
1
min read
LW
link
(arstechnica.com)
On Devin
Zvi
Mar 18, 2024, 1:20 PM
148
points
34
comments
11
min read
LW
link
(thezvi.wordpress.com)
What good is G-factor if you’re dumped in the woods? A field report from a camp counselor.
Hastings
Jan 12, 2024, 1:17 PM
148
points
22
comments
1
min read
LW
link
Some (problematic) aesthetics of what constitutes good work in academia
Steven Byrnes
Mar 11, 2024, 5:47 PM
148
points
12
comments
12
min read
LW
link
Leading The Parade
johnswentworth
Jan 31, 2024, 10:39 PM
148
points
31
comments
9
min read
LW
link
What o3 Becomes by 2028
Vladimir_Nesov
Dec 22, 2024, 12:37 PM
147
points
15
comments
5
min read
LW
link
0. CAST: Corrigibility as Singular Target
Max Harms
Jun 7, 2024, 10:29 PM
147
points
17
comments
8
min read
LW
link
Stanislav Petrov Quarterly Performance Review
Ricki Heicklen
Sep 26, 2024, 9:20 PM
147
points
3
comments
5
min read
LW
link
(bayesshammai.substack.com)
OpenAI o1
Zach Stein-Perlman
Sep 12, 2024, 5:30 PM
147
points
41
comments
1
min read
LW
link
Repeal the Jones Act of 1920
Zvi
Nov 27, 2024, 3:00 PM
146
points
24
comments
39
min read
LW
link
(thezvi.wordpress.com)
LLMs for Alignment Research: a safety priority?
abramdemski
Apr 4, 2024, 8:03 PM
145
points
24
comments
11
min read
LW
link
The Information: OpenAI shows ‘Strawberry’ to feds, races to launch it
Martín Soto
Aug 27, 2024, 11:10 PM
145
points
15
comments
3
min read
LW
link
When is a mind me?
Rob Bensinger
Apr 17, 2024, 5:56 AM
144
points
130
comments
15
min read
LW
link
Fields that I reference when thinking about AI takeover prevention
Buck
Aug 13, 2024, 11:08 PM
144
points
16
comments
10
min read
LW
link
(redwoodresearch.substack.com)
Why Don’t We Just… Shoggoth+Face+Paraphraser?
Daniel Kokotajlo
and
abramdemski
Nov 19, 2024, 8:53 PM
144
points
58
comments
14
min read
LW
link
That Alien Message—The Animation
Writer
Sep 7, 2024, 2:53 PM
144
points
10
comments
8
min read
LW
link
(youtu.be)
Nursing doubts
dynomight
Aug 30, 2024, 2:25 AM
144
points
23
comments
9
min read
LW
link
(dynomight.net)
China Hawks are Manufacturing an AI Arms Race
garrison
Nov 20, 2024, 6:17 PM
144
points
44
comments
LW
link
(garrisonlovely.substack.com)
The “Think It Faster” Exercise
Raemon
Dec 11, 2024, 7:14 PM
144
points
35
comments
13
min read
LW
link
Value Claims (In Particular) Are Usually Bullshit
johnswentworth
30 May 2024 6:26 UTC
144
points
18
comments
2
min read
LW
link
Momentum of Light in Glass
Ben
9 Oct 2024 20:19 UTC
143
points
44
comments
11
min read
LW
link
Back to first
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel