Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Page
4
The Lizardman and the Black Hat Bobcat
Screwtape
Apr 6, 2025, 7:02 PM
107
points
15
comments
9
min read
LW
link
How training-gamers might function (and win)
Vivek Hebbar
Apr 11, 2025, 9:26 PM
107
points
5
comments
13
min read
LW
link
Attribution-based parameter decomposition
Lucius Bushnaq
,
Dan Braun
,
StefanHex
,
jake_mendel
and
Lee Sharkey
Jan 25, 2025, 1:12 PM
107
points
21
comments
4
min read
LW
link
(publications.apolloresearch.ai)
We’re Not Advertising Enough (Post 3 of 6 on AI Governance)
Mass_Driver
May 22, 2025, 5:05 PM
107
points
10
comments
28
min read
LW
link
My supervillain origin story
Dmitry Vaintrob
Jan 27, 2025, 12:20 PM
106
points
2
comments
5
min read
LW
link
How do you deal w/ Super Stimuli?
Logan Riggs
Jan 14, 2025, 3:14 PM
106
points
25
comments
3
min read
LW
link
AI 2027: Responses
Zvi
Apr 8, 2025, 12:50 PM
106
points
3
comments
30
min read
LW
link
(thezvi.wordpress.com)
Prioritizing Work
jefftk
May 1, 2025, 2:00 AM
106
points
11
comments
1
min read
LW
link
(www.jefftk.com)
AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions
peterbarnett
and
Aaron_Scher
May 1, 2025, 10:46 PM
105
points
7
comments
8
min read
LW
link
(techgov.intelligence.org)
Steering Gemini with BiDPO
TurnTrout
Jan 31, 2025, 2:37 AM
104
points
5
comments
1
min read
LW
link
(turntrout.com)
My model of what is going on with LLMs
Cole Wyeth
Feb 13, 2025, 3:43 AM
104
points
49
comments
7
min read
LW
link
Show, not tell: GPT-4o is more opinionated in images than in text
Daniel Tan
and
eggsyntax
Apr 2, 2025, 8:51 AM
103
points
41
comments
3
min read
LW
link
A short course on AGI safety from the GDM Alignment team
Vika
and
Rohin Shah
Feb 14, 2025, 3:43 PM
103
points
2
comments
1
min read
LW
link
(deepmindsafetyresearch.medium.com)
Comment on “Death and the Gorgon”
Zack_M_Davis
Jan 1, 2025, 5:47 AM
103
points
33
comments
8
min read
LW
link
Judgements: Merging Prediction & Evidence
abramdemski
Feb 23, 2025, 7:35 PM
103
points
5
comments
6
min read
LW
link
AGI Safety & Alignment @ Google DeepMind is hiring
Rohin Shah
Feb 17, 2025, 9:11 PM
102
points
19
comments
10
min read
LW
link
How I talk to those above me
Maxwell Peterson
Mar 30, 2025, 6:54 AM
102
points
16
comments
8
min read
LW
link
Detecting Strategic Deception Using Linear Probes
Nicholas Goldowsky-Dill
,
bilalchughtai
,
StefanHex
and
Marius Hobbhahn
Feb 6, 2025, 3:46 PM
102
points
9
comments
2
min read
LW
link
(arxiv.org)
RA x ControlAI video: What if AI just keeps getting smarter?
Writer
May 2, 2025, 2:19 PM
100
points
17
comments
9
min read
LW
link
Reasons for and against working on technical AI safety at a frontier AI lab
bilalchughtai
Jan 5, 2025, 2:49 PM
100
points
12
comments
12
min read
LW
link
C’mon guys, Deliberate Practice is Real
Raemon
Feb 5, 2025, 10:33 PM
99
points
25
comments
9
min read
LW
link
Generating the Funniest Joke with RL (according to GPT-4.1)
agg
May 16, 2025, 5:09 AM
99
points
22
comments
4
min read
LW
link
Association taxes are collusion subsidies
KatjaGrace
May 27, 2025, 6:50 AM
99
points
7
comments
1
min read
LW
link
(worldspiritsockpuppet.com)
Timaeus in 2024
Jesse Hoogland
,
Stan van Wingerden
,
Alexander Gietelink Oldenziel
and
Daniel Murfet
Feb 20, 2025, 11:54 PM
99
points
1
comment
8
min read
LW
link
Third-wave AI safety needs sociopolitical thinking
Richard_Ngo
Mar 27, 2025, 12:55 AM
99
points
23
comments
26
min read
LW
link
The Ukraine War and the Kill Market
Martin Sustrik
May 4, 2025, 7:50 AM
98
points
13
comments
5
min read
LW
link
(250bpm.substack.com)
The purposeful drunkard
Dmitry Vaintrob
Jan 12, 2025, 12:27 PM
98
points
13
comments
6
min read
LW
link
AI Control May Increase Existential Risk
Jan_Kulveit
Mar 11, 2025, 2:30 PM
98
points
13
comments
1
min read
LW
link
What the Headlines Miss About the Latest Decision in the Musk vs. OpenAI Lawsuit
garrison
Mar 6, 2025, 7:49 PM
98
points
0
comments
LW
link
(garrisonlovely.substack.com)
Vacuum Decay: Expert Survey Results
JessRiedel
Mar 13, 2025, 6:31 PM
96
points
26
comments
LW
link
Reviewing LessWrong: Screwtape’s Basic Answer
Screwtape
Feb 5, 2025, 4:30 AM
96
points
18
comments
6
min read
LW
link
Towards a scale-free theory of intelligent agency
Richard_Ngo
Mar 21, 2025, 1:39 AM
96
points
44
comments
13
min read
LW
link
(www.mindthefuture.info)
How to Build a Third Place on Focusmate
Parker Conley
Apr 28, 2025, 11:46 PM
96
points
10
comments
5
min read
LW
link
(parconley.com)
The Sweet Lesson: AI Safety Should Scale With Compute
Jesse Hoogland
May 5, 2025, 7:03 PM
95
points
3
comments
3
min read
LW
link
The subset parity learning problem: much more than you wanted to know
Dmitry Vaintrob
Jan 3, 2025, 9:13 AM
94
points
18
comments
11
min read
LW
link
Tips and Code for Empirical Research Workflows
John Hughes
and
Ethan Perez
Jan 20, 2025, 10:31 PM
94
points
14
comments
20
min read
LW
link
On Eating the Sun
jessicata
Jan 8, 2025, 4:57 AM
94
points
96
comments
3
min read
LW
link
(unstablerontology.substack.com)
We probably won’t just play status games with each other after AGI
Matthew Barnett
Jan 15, 2025, 4:56 AM
93
points
21
comments
4
min read
LW
link
Implications of the inference scaling paradigm for AI safety
Ryan Kidd
Jan 14, 2025, 2:14 AM
93
points
70
comments
5
min read
LW
link
Five Recent AI Tutoring Studies
Arjun Panickssery
Jan 19, 2025, 3:53 AM
93
points
0
comments
2
min read
LW
link
(arjunpanickssery.substack.com)
Elite Coordination via the Consensus of Power
Richard_Ngo
Mar 19, 2025, 6:56 AM
92
points
15
comments
12
min read
LW
link
(www.mindthefuture.info)
The Rising Sea
Jesse Hoogland
Jan 25, 2025, 8:48 PM
92
points
2
comments
2
min read
LW
link
a confusion about preference orderings
nostalgebraist
May 11, 2025, 7:30 PM
92
points
38
comments
11
min read
LW
link
Introducing Squiggle AI
ozziegooen
Jan 3, 2025, 5:53 PM
92
points
15
comments
LW
link
ASI existential risk: Reconsidering Alignment as a Goal
habryka
Apr 15, 2025, 7:57 PM
91
points
14
comments
19
min read
LW
link
(michaelnotebook.com)
How I force LLMs to generate correct code
claudio
Mar 21, 2025, 2:40 PM
91
points
7
comments
5
min read
LW
link
Thoughts on the conservative assumptions in AI control
Buck
Jan 17, 2025, 7:23 PM
91
points
5
comments
13
min read
LW
link
Slow corporations as an intuition pump for AI R&D automation
ryan_greenblatt
and
elifland
May 9, 2025, 2:49 PM
91
points
23
comments
9
min read
LW
link
Six Thoughts on AI Safety
boazbarak
Jan 24, 2025, 10:20 PM
91
points
55
comments
15
min read
LW
link
Tips On Empirical Research Slides
James Chua
,
John Hughes
,
Ethan Perez
and
Owain_Evans
Jan 8, 2025, 5:06 AM
90
points
4
comments
6
min read
LW
link
Back to first
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel