Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Updates from Comments on “AI 2027 is a Bet Against Amdahl’s Law”
snewman
May 2, 2025, 11:52 PM
40
points
2
comments
13
min read
LW
link
Attend SPAR’s virtual demo day! (career fair + talks)
agucova
May 2, 2025, 11:45 PM
9
points
0
comments
LW
link
(demoday.sparai.org)
Why does METR score o3 as effective for such a long time duration despite overall poor scores?
Cole Wyeth
May 2, 2025, 10:58 PM
19
points
3
comments
1
min read
LW
link
Short story: Who is nancygonzalez8451097
Anders Lindström
May 2, 2025, 9:01 PM
13
points
2
comments
5
min read
LW
link
Interim Research Report: Mechanisms of Awareness
Josh Engels
,
Neel Nanda
and
Senthooran Rajamanoharan
May 2, 2025, 8:29 PM
43
points
6
comments
8
min read
LW
link
Agents, Tools, and Simulators
WillPetillo
,
Sean Herrington
,
Adebayo Mubarak
,
Cancus
and
Spencer Ames
May 2, 2025, 8:19 PM
12
points
0
comments
10
min read
LW
link
Obstacles in ARC’s agenda: Low Probability Estimation
David Matolcsi
May 2, 2025, 7:38 PM
43
points
0
comments
6
min read
LW
link
What’s going on with AI progress and trends? (As of 5/2025)
ryan_greenblatt
May 2, 2025, 7:00 PM
71
points
7
comments
8
min read
LW
link
When AI Optimizes for the Wrong Thing
Anthony Fox
May 2, 2025, 6:00 PM
5
points
0
comments
1
min read
LW
link
Alignment Structure Direction—Recursive Adversarial Oversight(RAO)
Jayden Shepard
May 2, 2025, 5:51 PM
2
points
0
comments
2
min read
LW
link
AI Welfare Risks
Adrià Moret
May 2, 2025, 5:49 PM
5
points
0
comments
1
min read
LW
link
(philpapers.org)
Philosoplasticity: On the Inevitable Drift of Meaning in Recursive Self-Interpreting Systems
Maikol Coin
May 2, 2025, 5:46 PM
−1
points
0
comments
4
min read
LW
link
Supermen of the (Not so Far) Future
TerriLeaf
May 2, 2025, 3:55 PM
9
points
0
comments
4
min read
LW
link
Steering Language Models in Multiple Directions Simultaneously
lukemarks
,
Narmeen
and
Amirali Abdullah
May 2, 2025, 3:27 PM
18
points
0
comments
7
min read
LW
link
AI Incident Monitoring: A Brief Analysis
Spencer Ames
May 2, 2025, 3:06 PM
3
points
0
comments
5
min read
LW
link
RA x ControlAI video: What if AI just keeps getting smarter?
Writer
May 2, 2025, 2:19 PM
100
points
17
comments
9
min read
LW
link
OpenAI Preparedness Framework 2.0
Zvi
May 2, 2025, 1:10 PM
60
points
1
comment
23
min read
LW
link
(thezvi.wordpress.com)
Ex-OpenAI employee amici leave to file denied in Musk v OpenAI case?
TFD
May 2, 2025, 12:27 PM
4
points
6
comments
2
min read
LW
link
(www.thefloatingdroid.com)
Roads are at maximum efficiency always
Hruss
May 2, 2025, 10:29 AM
1
point
3
comments
1
min read
LW
link
The Continuum Fallacy and its Relatives
Zero Contradictions
May 2, 2025, 2:58 AM
4
points
2
comments
4
min read
LW
link
(thewaywardaxolotl.blogspot.com)
Memory Decoding Journal Club: Motor learning selectively strengthens cortical and striatal synapses of motor engram neurons
Devin Ward
May 1, 2025, 11:52 PM
1
point
0
comments
1
min read
LW
link
My Research Process: Understanding and Cultivating Research Taste
Neel Nanda
May 1, 2025, 11:08 PM
26
points
1
comment
9
min read
LW
link
AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions
peterbarnett
and
Aaron_Scher
May 1, 2025, 10:46 PM
105
points
7
comments
8
min read
LW
link
(techgov.intelligence.org)
How to specify an alignment target
juggins
May 1, 2025, 9:11 PM
14
points
2
comments
12
min read
LW
link
Obstacles in ARC’s agenda: Mechanistic Anomaly Detection
David Matolcsi
May 1, 2025, 8:51 PM
42
points
1
comment
11
min read
LW
link
AI-Generated GitHub repo backdated with junk then filled with my systems work. Has anyone seen this before?
rgunther
May 1, 2025, 8:14 PM
7
points
1
comment
1
min read
LW
link
What is Inadequate about Bayesianism for AI Alignment: Motivating Infra-Bayesianism
Brittany Gelb
May 1, 2025, 7:06 PM
17
points
0
comments
7
min read
LW
link
Can LLMs Simulate Internal Evaluation? A Case Study in Self-Generated Recommendations
The Neutral Mind
May 1, 2025, 7:04 PM
4
points
0
comments
2
min read
LW
link
Superhuman Coders in AI 2027 - Not So Fast
dschwarz
and
FutureSearch
May 1, 2025, 6:56 PM
59
points
0
comments
5
min read
LW
link
AI #114: Liars, Sycophants and Cheaters
Zvi
May 1, 2025, 2:00 PM
40
points
5
comments
63
min read
LW
link
(thezvi.wordpress.com)
Slowdown After 2028: Compute, RLVR Uncertainty, MoE Data Wall
Vladimir_Nesov
May 1, 2025, 1:54 PM
175
points
22
comments
5
min read
LW
link
Anthropomorphizing AI might be good, actually
Seth Herd
May 1, 2025, 1:50 PM
35
points
6
comments
3
min read
LW
link
Dont focus on updating P doom
Algon
May 1, 2025, 11:10 AM
7
points
3
comments
2
min read
LW
link
Prioritizing Work
jefftk
May 1, 2025, 2:00 AM
106
points
11
comments
1
min read
LW
link
(www.jefftk.com)
Don’t rely on a “race to the top”
sjadler
May 1, 2025, 12:33 AM
4
points
0
comments
1
min read
LW
link
Meta-Technicalities: Safeguarding Values in Formal Systems
LTM
Apr 30, 2025, 11:43 PM
2
points
0
comments
3
min read
LW
link
(routecause.substack.com)
Obstacles in ARC’s agenda: Finding explanations
David Matolcsi
Apr 30, 2025, 11:03 PM
122
points
10
comments
17
min read
LW
link
GPT-4o Responds to Negative Feedback
Zvi
Apr 30, 2025, 8:20 PM
45
points
2
comments
18
min read
LW
link
(thezvi.wordpress.com)
State of play of AI progress (and related brakes on an intelligence explosion) [Linkpost]
Noosphere89
Apr 30, 2025, 7:58 PM
7
points
0
comments
5
min read
LW
link
(www.interconnects.ai)
Don’t accuse your interlocutor of being insufficiently truth-seeking
TFD
Apr 30, 2025, 7:38 PM
30
points
15
comments
2
min read
LW
link
(www.thefloatingdroid.com)
How can we solve diffuse threats like research sabotage with AI control?
Vivek Hebbar
Apr 30, 2025, 7:23 PM
52
points
1
comment
8
min read
LW
link
[Question]
Can Narrowing One’s Reference Class Undermine the Doomsday Argument?
Iannoose n.
Apr 30, 2025, 6:24 PM
2
points
1
comment
1
min read
LW
link
[Question]
Does there exist an interactive reasoning map tool that lets users visually lay out claims, assign probabilities and confidence levels, and dynamically adjust their beliefs based on weighted influences between connected assertions?
Zack Friedman
Apr 30, 2025, 6:22 PM
5
points
4
comments
1
min read
LW
link
Distilling the Internal Model Principle part II
JoseFaustino
Apr 30, 2025, 5:56 PM
15
points
0
comments
19
min read
LW
link
Research Priorities for Hardware-Enabled Mechanisms (HEMs)
aog
Apr 30, 2025, 5:43 PM
17
points
2
comments
15
min read
LW
link
(www.longview.org)
Video and transcript of talk on automating alignment research
Joe Carlsmith
Apr 30, 2025, 5:43 PM
21
points
0
comments
24
min read
LW
link
(joecarlsmith.com)
Can we safely automate alignment research?
Joe Carlsmith
Apr 30, 2025, 5:37 PM
54
points
29
comments
48
min read
LW
link
(joecarlsmith.com)
Investigating task-specific prompts and sparse autoencoders for activation monitoring
Henk Tillman
Apr 30, 2025, 5:09 PM
23
points
0
comments
1
min read
LW
link
(arxiv.org)
European Links (30.04.25)
Martin Sustrik
Apr 30, 2025, 3:40 PM
15
points
1
comment
8
min read
LW
link
(250bpm.substack.com)
Scaling Laws for Scalable Oversight
Subhash Kantamneni
,
Josh Engels
,
David Baek
and
Max Tegmark
Apr 30, 2025, 12:13 PM
27
points
0
comments
9
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel