Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
“Superhuman” Isn’t Well Specified
JustisMills
May 3, 2025, 11:42 PM
32
points
9
comments
3
min read
LW
link
(justismills.substack.com)
Navigating burnout
gw
May 3, 2025, 10:07 PM
73
points
1
comment
9
min read
LW
link
(www.georgeyw.com)
What is your favorite podcast?
ChristianKl
May 3, 2025, 9:25 PM
32
points
9
comments
1
min read
LW
link
[Question]
Does translating a post with an LLM affect its rating?
ReverendBayes
May 3, 2025, 2:45 PM
9
points
9
comments
2
min read
LW
link
SimpleStories: A Better Synthetic Dataset and Tiny Models for Interpretability
Lennart Finke
May 3, 2025, 2:04 PM
13
points
0
comments
1
min read
LW
link
What’s up with AI’s vision
Joachim Bartosik
May 3, 2025, 1:23 PM
12
points
19
comments
1
min read
LW
link
Sparsity is the enemy of feature extraction (ft. absorption)
7vik
,
chanind
and
Adrià Garriga-alonso
May 3, 2025, 10:13 AM
31
points
0
comments
6
min read
LW
link
Exploring out-of-context reasoning (OOCR) fine-tuning in LLMs to increase test-phase awareness
Sanyu Rajakumar
May 3, 2025, 3:33 AM
8
points
0
comments
6
min read
LW
link
Prison Journal: Building Better Thinking Skills—Altruistic Person Saved > 100 Gorillas saved
P. João
May 3, 2025, 1:34 AM
−30
points
2
comments
1
min read
LW
link
Updates from Comments on “AI 2027 is a Bet Against Amdahl’s Law”
snewman
May 2, 2025, 11:52 PM
40
points
2
comments
13
min read
LW
link
Attend SPAR’s virtual demo day! (career fair + talks)
agucova
May 2, 2025, 11:45 PM
9
points
0
comments
LW
link
(demoday.sparai.org)
Why does METR score o3 as effective for such a long time duration despite overall poor scores?
Cole Wyeth
May 2, 2025, 10:58 PM
19
points
3
comments
1
min read
LW
link
Short story: Who is nancygonzalez8451097
Anders Lindström
May 2, 2025, 9:01 PM
13
points
2
comments
5
min read
LW
link
Interim Research Report: Mechanisms of Awareness
Josh Engels
,
Neel Nanda
and
Senthooran Rajamanoharan
May 2, 2025, 8:29 PM
43
points
6
comments
8
min read
LW
link
Agents, Tools, and Simulators
WillPetillo
,
Sean Herrington
,
Adebayo Mubarak
,
Cancus
and
Spencer Ames
May 2, 2025, 8:19 PM
12
points
0
comments
10
min read
LW
link
Obstacles in ARC’s agenda: Low Probability Estimation
David Matolcsi
May 2, 2025, 7:38 PM
43
points
0
comments
6
min read
LW
link
What’s going on with AI progress and trends? (As of 5/2025)
ryan_greenblatt
May 2, 2025, 7:00 PM
71
points
7
comments
8
min read
LW
link
When AI Optimizes for the Wrong Thing
Anthony Fox
May 2, 2025, 6:00 PM
5
points
0
comments
1
min read
LW
link
Alignment Structure Direction—Recursive Adversarial Oversight(RAO)
Jayden Shepard
May 2, 2025, 5:51 PM
2
points
0
comments
2
min read
LW
link
AI Welfare Risks
Adrià Moret
May 2, 2025, 5:49 PM
5
points
0
comments
1
min read
LW
link
(philpapers.org)
Philosoplasticity: On the Inevitable Drift of Meaning in Recursive Self-Interpreting Systems
Maikol Coin
May 2, 2025, 5:46 PM
−1
points
0
comments
4
min read
LW
link
Supermen of the (Not so Far) Future
TerriLeaf
May 2, 2025, 3:55 PM
9
points
0
comments
4
min read
LW
link
Steering Language Models in Multiple Directions Simultaneously
lukemarks
,
Narmeen
and
Amirali Abdullah
May 2, 2025, 3:27 PM
18
points
0
comments
7
min read
LW
link
AI Incident Monitoring: A Brief Analysis
Spencer Ames
May 2, 2025, 3:06 PM
3
points
0
comments
5
min read
LW
link
RA x ControlAI video: What if AI just keeps getting smarter?
Writer
May 2, 2025, 2:19 PM
100
points
17
comments
9
min read
LW
link
OpenAI Preparedness Framework 2.0
Zvi
May 2, 2025, 1:10 PM
60
points
1
comment
23
min read
LW
link
(thezvi.wordpress.com)
Ex-OpenAI employee amici leave to file denied in Musk v OpenAI case?
TFD
May 2, 2025, 12:27 PM
4
points
6
comments
2
min read
LW
link
(www.thefloatingdroid.com)
Roads are at maximum efficiency always
Hruss
May 2, 2025, 10:29 AM
1
point
3
comments
1
min read
LW
link
The Continuum Fallacy and its Relatives
Zero Contradictions
May 2, 2025, 2:58 AM
4
points
2
comments
4
min read
LW
link
(thewaywardaxolotl.blogspot.com)
Memory Decoding Journal Club: Motor learning selectively strengthens cortical and striatal synapses of motor engram neurons
Devin Ward
May 1, 2025, 11:52 PM
1
point
0
comments
1
min read
LW
link
My Research Process: Understanding and Cultivating Research Taste
Neel Nanda
May 1, 2025, 11:08 PM
26
points
1
comment
9
min read
LW
link
AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions
peterbarnett
and
Aaron_Scher
May 1, 2025, 10:46 PM
105
points
7
comments
8
min read
LW
link
(techgov.intelligence.org)
How to specify an alignment target
Richard Juggins
May 1, 2025, 9:11 PM
14
points
2
comments
12
min read
LW
link
Obstacles in ARC’s agenda: Mechanistic Anomaly Detection
David Matolcsi
May 1, 2025, 8:51 PM
42
points
1
comment
11
min read
LW
link
AI-Generated GitHub repo backdated with junk then filled with my systems work. Has anyone seen this before?
rgunther
May 1, 2025, 8:14 PM
7
points
1
comment
1
min read
LW
link
What is Inadequate about Bayesianism for AI Alignment: Motivating Infra-Bayesianism
Brittany Gelb
May 1, 2025, 7:06 PM
17
points
0
comments
7
min read
LW
link
Can LLMs Simulate Internal Evaluation? A Case Study in Self-Generated Recommendations
The Neutral Mind
May 1, 2025, 7:04 PM
4
points
0
comments
2
min read
LW
link
Superhuman Coders in AI 2027 - Not So Fast
dschwarz
and
FutureSearch
May 1, 2025, 6:56 PM
62
points
0
comments
5
min read
LW
link
AI #114: Liars, Sycophants and Cheaters
Zvi
May 1, 2025, 2:00 PM
40
points
5
comments
63
min read
LW
link
(thezvi.wordpress.com)
Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall
Vladimir_Nesov
May 1, 2025, 1:54 PM
175
points
22
comments
5
min read
LW
link
Anthropomorphizing AI might be good, actually
Seth Herd
May 1, 2025, 1:50 PM
35
points
6
comments
3
min read
LW
link
Dont focus on updating P doom
Algon
May 1, 2025, 11:10 AM
7
points
3
comments
2
min read
LW
link
Prioritizing Work
jefftk
May 1, 2025, 2:00 AM
106
points
11
comments
1
min read
LW
link
(www.jefftk.com)
Don’t rely on a “race to the top”
sjadler
May 1, 2025, 12:33 AM
4
points
0
comments
1
min read
LW
link
Meta-Technicalities: Safeguarding Values in Formal Systems
LTM
Apr 30, 2025, 11:43 PM
2
points
0
comments
3
min read
LW
link
(routecause.substack.com)
Obstacles in ARC’s agenda: Finding explanations
David Matolcsi
Apr 30, 2025, 11:03 PM
122
points
10
comments
17
min read
LW
link
GPT-4o Responds to Negative Feedback
Zvi
30 Apr 2025 20:20 UTC
45
points
2
comments
18
min read
LW
link
(thezvi.wordpress.com)
State of play of AI progress (and related brakes on an intelligence explosion) [Linkpost]
Noosphere89
30 Apr 2025 19:58 UTC
7
points
0
comments
5
min read
LW
link
(www.interconnects.ai)
Don’t accuse your interlocutor of being insufficiently truth-seeking
TFD
30 Apr 2025 19:38 UTC
30
points
15
comments
2
min read
LW
link
(www.thefloatingdroid.com)
How can we solve diffuse threats like research sabotage with AI control?
Vivek Hebbar
30 Apr 2025 19:23 UTC
52
points
1
comment
8
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel