Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
The Grapes of Hardness
adamShimi
Mar 11, 2025, 9:01 PM
8
points
0
comments
5
min read
LW
link
(formethods.substack.com)
Don’t over-update on FrontierMath results
David Matolcsi
Mar 11, 2025, 8:44 PM
51
points
7
comments
9
min read
LW
link
Response to Scott Alexander on Imprisonment
Zvi
Mar 11, 2025, 8:40 PM
40
points
4
comments
9
min read
LW
link
(thezvi.wordpress.com)
Paths and waystations in AI safety
Joe Carlsmith
Mar 11, 2025, 6:52 PM
41
points
1
comment
11
min read
LW
link
(joecarlsmith.substack.com)
Meridian Cambridge Visiting Researcher Programme: Turn AI safety ideas into funded projects in one week!
Meridian Cambridge
Mar 11, 2025, 5:46 PM
13
points
0
comments
2
min read
LW
link
Elon Musk May Be Transitioning to Bipolar Type I
Cyborg25
Mar 11, 2025, 5:45 PM
83
points
22
comments
4
min read
LW
link
Scaling AI Regulation: Realistically, what Can (and Can’t) Be Regulated?
Katalina Hernandez
Mar 11, 2025, 4:51 PM
3
points
1
comment
3
min read
LW
link
How Language Models Understand Nullability
Anish Tondwalkar
and
Alex Sanchez-Stern
Mar 11, 2025, 3:57 PM
5
points
0
comments
2
min read
LW
link
(dmodel.ai)
Forethought: a new AI macrostrategy group
Max Dalton
,
Tom Davidson
,
wdmacaskill
and
AmritSidhu-Brar
Mar 11, 2025, 3:39 PM
18
points
0
comments
3
min read
LW
link
Preparing for the Intelligence Explosion
fin
and
wdmacaskill
Mar 11, 2025, 3:38 PM
78
points
17
comments
1
min read
LW
link
(www.forethought.org)
stop solving problems that have already been solved
dhruvmethi
Mar 11, 2025, 3:30 PM
10
points
3
comments
8
min read
LW
link
AI Control May Increase Existential Risk
Jan_Kulveit
Mar 11, 2025, 2:30 PM
98
points
13
comments
1
min read
LW
link
When is it Better to Train on the Alignment Proxy?
dil-leik-og
Mar 11, 2025, 1:35 PM
14
points
0
comments
9
min read
LW
link
A different take on the Musk v OpenAI preliminary injunction order
TFD
Mar 11, 2025, 12:46 PM
8
points
0
comments
20
min read
LW
link
(www.thefloatingdroid.com)
Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases
Fabien Roger
Mar 11, 2025, 11:52 AM
121
points
23
comments
11
min read
LW
link
(alignment.anthropic.com)
A Hogwarts Guide to Citizenship
WillPetillo
Mar 11, 2025, 5:50 AM
7
points
1
comment
3
min read
LW
link
Cognitive Reframing—How to Overcome Negative Thought Patterns and Behaviors
Declan Molony
Mar 11, 2025, 4:56 AM
11
points
0
comments
4
min read
LW
link
Trojan Sky
Richard_Ngo
Mar 11, 2025, 3:14 AM
245
points
39
comments
12
min read
LW
link
(www.narrativeark.xyz)
OpenAI: Detecting misbehavior in frontier reasoning models
Daniel Kokotajlo
Mar 11, 2025, 2:17 AM
183
points
26
comments
4
min read
LW
link
(openai.com)
HPMOR Anniversary Parties: Coordination, Resources, and Discussion
Screwtape
Mar 11, 2025, 1:30 AM
52
points
6
comments
7
min read
LW
link
Positional kernels of attention heads
Alex Gibson
Mar 10, 2025, 11:17 PM
9
points
0
comments
12
min read
LW
link
Progress links and short notes, 2025-03-10
jasoncrawford
Mar 10, 2025, 8:27 PM
8
points
0
comments
4
min read
LW
link
(newsletter.rootsofprogress.org)
The Manus Marketing Madness
Zvi
Mar 10, 2025, 8:10 PM
54
points
0
comments
24
min read
LW
link
(thezvi.wordpress.com)
You can just play
aswath krishnan
Mar 10, 2025, 8:00 PM
−5
points
0
comments
2
min read
LW
link
How to Use Prompt Engineering to Rewire Your Brain
aswath krishnan
Mar 10, 2025, 8:00 PM
1
point
0
comments
5
min read
LW
link
(www.aswathkrishnan.com)
When Independent Optimization Is Worse Than Randomness
Chaotic rationalist
Mar 10, 2025, 7:46 PM
−4
points
0
comments
2
min read
LW
link
Stress exists only where the Mind makes it
Noahh
Mar 10, 2025, 7:44 PM
5
points
2
comments
4
min read
LW
link
Counterargument to Godel’s Modal Ontological Argument
Wynn
Mar 10, 2025, 7:38 PM
−1
points
0
comments
4
min read
LW
link
[Question]
How much do frontier LLMs code and browse while in training?
Joe Rogero
Mar 10, 2025, 7:34 PM
7
points
0
comments
1
min read
LW
link
Observations on self-supervised Learning for vision
Dinkar Juyal
Mar 10, 2025, 7:31 PM
3
points
0
comments
5
min read
LW
link
Introducing 11 New AI Safety Organizations—Catalyze’s Winter 24/25 London Incubation Program Cohort
Alexandra Bos
Mar 10, 2025, 7:26 PM
70
points
0
comments
LW
link
The Jackpot Jinx (or why “Superintelligence Strategy” is wrong)
E.G. Blee-Goldman
Mar 10, 2025, 7:18 PM
13
points
0
comments
5
min read
LW
link
Effective AI Outreach | A Data Driven Approach
NoahCWilson
Mar 10, 2025, 7:18 PM
1
point
0
comments
15
min read
LW
link
Emergent AI Society. Tasks, Scarcity, Talks
Andrey Seryakov
Mar 10, 2025, 7:18 PM
1
point
0
comments
5
min read
LW
link
Sentinel minutes #10/2025: Trump tariffs, US/China tensions, Claude code reward hacking.
NunoSempere
Mar 10, 2025, 7:00 PM
25
points
0
comments
10
min read
LW
link
(blog.sentinel-team.org)
Have you actually tried raising the birth rate?
Yair Halberstadt
Mar 10, 2025, 6:06 PM
6
points
5
comments
1
min read
LW
link
Split Personality Training: Revealing Latent Knowledge Through Personality-Shift Tokens
Florian_Dietz
Mar 10, 2025, 4:07 PM
37
points
4
comments
9
min read
LW
link
We Have No Plan for Preventing Loss of Control in Open Models
Andrew Dickson
Mar 10, 2025, 3:35 PM
45
points
11
comments
22
min read
LW
link
Lock-In Threat Models
alamerton
Mar 10, 2025, 10:22 AM
5
points
0
comments
8
min read
LW
link
Book Review: Affective Neuroscience
sarahconstantin
Mar 10, 2025, 6:50 AM
62
points
8
comments
13
min read
LW
link
(sarahconstantin.substack.com)
The chessboard world
phdead
Mar 10, 2025, 1:26 AM
5
points
0
comments
8
min read
LW
link
[Question]
when will LLMs become human-level bloggers?
nostalgebraist
Mar 9, 2025, 9:10 PM
124
points
34
comments
6
min read
LW
link
Everything I Know About Semantics I Learned From Music Notation
J Bostock
Mar 9, 2025, 6:09 PM
34
points
2
comments
10
min read
LW
link
Phoenix Rising
Metacelsus
Mar 9, 2025, 11:53 AM
66
points
7
comments
5
min read
LW
link
(denovo.substack.com)
How well can Claude write coding questions?
bodry
Mar 9, 2025, 5:29 AM
3
points
1
comment
12
min read
LW
link
A model of the final phase: the current frontier AIs as de facto CEOs of their own companies
Mitchell_Porter
Mar 8, 2025, 10:15 PM
23
points
2
comments
1
min read
LW
link
Harry Potter and the Methods of Rationality 10 Year Anniversary Party!
Robert Cousineau
Mar 8, 2025, 9:29 PM
6
points
0
comments
1
min read
LW
link
A case for peer-reviewed conspiracy theories
Sam G
Mar 8, 2025, 8:41 PM
13
points
2
comments
4
min read
LW
link
The machine has no mouth and it must scream
zef
Mar 8, 2025, 4:40 PM
77
points
1
comment
7
min read
LW
link
(zephyyr.substack.com)
How Do We Fix the Education Crisis?
James Camacho
Mar 8, 2025, 2:59 AM
12
points
4
comments
8
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel