Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
The Dangerous Illusion of AI Deterrence: Why MAIM Isn’t Rational
mc1soft
Mar 22, 2025, 10:55 PM
3
points
0
comments
2
min read
LW
link
Dayton, Ohio, ACX Meetup
Lunawarrior
Mar 22, 2025, 7:45 PM
1
point
0
comments
1
min read
LW
link
[Replication] Crosscoder-based Stage-Wise Model Diffing
Anna Soligo
,
Thomas Read
,
Oliver Clive-Griffin
,
dmanningcoe
,
Chun Hei Yip
,
rajashree
and
Jason Gross
Mar 22, 2025, 6:35 PM
21
points
0
comments
7
min read
LW
link
The Principle of Satisfying Foreknowledge
Randall Reams
Mar 22, 2025, 6:20 PM
1
point
0
comments
2
min read
LW
link
[Question]
Urgency in the ITN framework
Shaïman
Mar 22, 2025, 6:16 PM
0
points
2
comments
1
min read
LW
link
Transhumanism and AI: Toward Prosperity or Extinction?
Shaïman
Mar 22, 2025, 6:16 PM
10
points
2
comments
6
min read
LW
link
Tied Crosscoders: Explaining Chat Behavior from Base Model
Santiago Aranguri
Mar 22, 2025, 6:07 PM
9
points
0
comments
12
min read
LW
link
100+ concrete projects and open problems in evals
Marius Hobbhahn
Mar 22, 2025, 3:21 PM
74
points
1
comment
1
min read
LW
link
Do models say what they learn?
Andy Arditi
,
marvinli
,
Joe Benton
and
Miles Turpin
Mar 22, 2025, 3:19 PM
126
points
12
comments
13
min read
LW
link
AGI Morality and Why It Is Unlikely to Emerge as a Feature of Superintelligence
funnyfranco
Mar 22, 2025, 12:06 PM
1
point
9
comments
18
min read
LW
link
2025 Q3 Pivotal Research Fellowship: Applications Open
Tobias H
Mar 22, 2025, 10:54 AM
4
points
0
comments
2
min read
LW
link
Good Research Takes are Not Sufficient for Good Strategic Takes
Neel Nanda
Mar 22, 2025, 10:13 AM
292
points
28
comments
4
min read
LW
link
(www.neelnanda.io)
Grammatical Roles and Social Roles: A Structural Analogy
Lucien
Mar 22, 2025, 7:44 AM
0
points
0
comments
1
min read
LW
link
Legibility
lsusr
Mar 22, 2025, 6:54 AM
19
points
22
comments
2
min read
LW
link
Why Were We Wrong About China and AI? A Case Study in Failed Rationality
thedudeabides
Mar 22, 2025, 5:13 AM
31
points
45
comments
1
min read
LW
link
A Short Diatribe on Hidden Assertions.
Eggs
Mar 22, 2025, 3:14 AM
−9
points
2
comments
3
min read
LW
link
Transformer Attention’s High School Math Mistake
Max Ma
Mar 22, 2025, 12:16 AM
−13
points
1
comment
1
min read
LW
link
Making Sense of President Trump’s Annexation Obsession
Annapurna
Mar 21, 2025, 9:10 PM
−13
points
3
comments
5
min read
LW
link
(jorgevelez.substack.com)
How I force LLMs to generate correct code
claudio
Mar 21, 2025, 2:40 PM
91
points
7
comments
5
min read
LW
link
Prospects for Alignment Automation: Interpretability Case Study
Jacob Pfau
and
Geoffrey Irving
Mar 21, 2025, 2:05 PM
32
points
5
comments
8
min read
LW
link
Epoch AI released a GATE Scenario Explorer
Lee.aao
Mar 21, 2025, 1:57 PM
10
points
0
comments
1
min read
LW
link
(epoch.ai)
They Took MY Job?
Zvi
Mar 21, 2025, 1:30 PM
37
points
4
comments
9
min read
LW
link
(thezvi.wordpress.com)
Silly Time
jefftk
Mar 21, 2025, 12:30 PM
45
points
2
comments
2
min read
LW
link
(www.jefftk.com)
Towards a scale-free theory of intelligent agency
Richard_Ngo
Mar 21, 2025, 1:39 AM
96
points
44
comments
13
min read
LW
link
(www.mindthefuture.info)
[Question]
Any mistakes in my understanding of Transformers?
Kallistos
Mar 21, 2025, 12:34 AM
3
points
7
comments
1
min read
LW
link
A Critique of “Utility”
Zero Contradictions
Mar 20, 2025, 11:21 PM
−2
points
10
comments
2
min read
LW
link
(thewaywardaxolotl.blogspot.com)
Intention to Treat
Alicorn
Mar 20, 2025, 8:01 PM
195
points
5
comments
2
min read
LW
link
Anthropic: Progress from our Frontier Red Team
UnofficialLinkpostBot
Mar 20, 2025, 7:12 PM
16
points
3
comments
6
min read
LW
link
(www.anthropic.com)
Everything’s An Emergency
Bentham's Bulldog
Mar 20, 2025, 5:12 PM
18
points
0
comments
2
min read
LW
link
Non-Consensual Consent: The Performance of Choice in a Coercive World
Alex_Steiner
Mar 20, 2025, 5:12 PM
27
points
4
comments
13
min read
LW
link
Minor interpretability exploration #4: LayerNorm and the learning coefficient
Rareș Baron
Mar 20, 2025, 4:18 PM
2
points
0
comments
1
min read
LW
link
[Question]
How far along Metr’s law can AI start automating or helping with alignment research?
Christopher King
Mar 20, 2025, 3:58 PM
20
points
21
comments
1
min read
LW
link
Human alignment
Lucien
Mar 20, 2025, 3:52 PM
−16
points
2
comments
1
min read
LW
link
[Question]
Seeking: more Sci Fi micro reviews
Yair Halberstadt
Mar 20, 2025, 2:31 PM
7
points
0
comments
1
min read
LW
link
AI #108: Straight Line on a Graph
Zvi
Mar 20, 2025, 1:50 PM
43
points
5
comments
39
min read
LW
link
(thezvi.wordpress.com)
What is an alignment tax?
Vishakha
and
Algon
Mar 20, 2025, 1:06 PM
5
points
0
comments
1
min read
LW
link
(aisafety.info)
Longtermist Implications of the Existence Neutrality Hypothesis
Maxime Riché
Mar 20, 2025, 12:20 PM
3
points
2
comments
21
min read
LW
link
You don’t have to be “into EA” to attend EAG(x) Conferences
gergogaspar
Mar 20, 2025, 10:44 AM
1
point
0
comments
1
min read
LW
link
Defense Against The Super-Worms
viemccoy
Mar 20, 2025, 7:24 AM
23
points
1
comment
2
min read
LW
link
Socially Graceful Degradation
Screwtape
Mar 20, 2025, 4:03 AM
57
points
9
comments
9
min read
LW
link
Apply to MATS 8.0!
Ryan Kidd
and
K Richards
Mar 20, 2025, 2:17 AM
63
points
5
comments
4
min read
LW
link
Improved visualizations of METR Time Horizons paper.
LDJ
Mar 19, 2025, 11:36 PM
20
points
4
comments
2
min read
LW
link
Is CCP authoritarianism good for building safe AI?
Hruss
Mar 19, 2025, 11:13 PM
1
point
0
comments
1
min read
LW
link
The case against “The case against AI alignment”
KvmanThinking
19 Mar 2025 22:40 UTC
2
points
0
comments
1
min read
LW
link
[Question]
Superintelligence Strategy: A Pragmatic Path to… Doom?
Mr Beastly
19 Mar 2025 22:30 UTC
6
points
0
comments
3
min read
LW
link
SHIFT relies on token-level features to de-bias Bias in Bios probes
Tim Hua
19 Mar 2025 21:29 UTC
39
points
2
comments
6
min read
LW
link
Janet must die
Shmi
19 Mar 2025 20:35 UTC
12
points
3
comments
2
min read
LW
link
[Question]
Why am I getting downvoted on Lesswrong?
Oxidize
19 Mar 2025 18:32 UTC
7
points
14
comments
1
min read
LW
link
Forecasting AI Futures Resource Hub
Alvin Ånestrand
19 Mar 2025 17:26 UTC
2
points
0
comments
2
min read
LW
link
(forecastingaifutures.substack.com)
TBC episode w Dave Kasten from Control AI on AI Policy
Eneasz
19 Mar 2025 17:09 UTC
8
points
0
comments
1
min read
LW
link
(www.thebayesianconspiracy.com)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel