Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Improved visualizations of METR Time Horizons paper.
LDJ
Mar 19, 2025, 11:36 PM
20
points
4
comments
2
min read
LW
link
Is CCP authoritarianism good for building safe AI?
Hruss
Mar 19, 2025, 11:13 PM
1
point
0
comments
1
min read
LW
link
The case against “The case against AI alignment”
KvmanThinking
Mar 19, 2025, 10:40 PM
2
points
0
comments
1
min read
LW
link
[Question]
Superintelligence Strategy: A Pragmatic Path to… Doom?
Mr Beastly
Mar 19, 2025, 10:30 PM
6
points
0
comments
3
min read
LW
link
SHIFT relies on token-level features to de-bias Bias in Bios probes
Tim Hua
Mar 19, 2025, 9:29 PM
39
points
2
comments
6
min read
LW
link
Janet must die
Shmi
Mar 19, 2025, 8:35 PM
12
points
3
comments
2
min read
LW
link
[Question]
Why am I getting downvoted on Lesswrong?
Oxidize
Mar 19, 2025, 6:32 PM
7
points
14
comments
1
min read
LW
link
Forecasting AI Futures Resource Hub
Alvin Ånestrand
Mar 19, 2025, 5:26 PM
2
points
0
comments
2
min read
LW
link
(forecastingaifutures.substack.com)
TBC episode w Dave Kasten from Control AI on AI Policy
Eneasz
Mar 19, 2025, 5:09 PM
8
points
0
comments
1
min read
LW
link
(www.thebayesianconspiracy.com)
Prioritizing threats for AI control
ryan_greenblatt
Mar 19, 2025, 5:09 PM
58
points
2
comments
10
min read
LW
link
The Illusion of Transparency as a Trust-Building Mechanism
Priyanka Bharadwaj
Mar 19, 2025, 5:09 PM
2
points
0
comments
1
min read
LW
link
How Do We Govern AI Well?
kaime
Mar 19, 2025, 5:08 PM
2
points
0
comments
25
min read
LW
link
METR: Measuring AI Ability to Complete Long Tasks
Zach Stein-Perlman
Mar 19, 2025, 4:00 PM
241
points
105
comments
5
min read
LW
link
(metr.org)
Why I think AI will go poorly for humanity
Alek Westover
Mar 19, 2025, 3:52 PM
13
points
0
comments
30
min read
LW
link
The principle of genomic liberty
TsviBT
Mar 19, 2025, 2:27 PM
76
points
51
comments
17
min read
LW
link
Going Nova
Zvi
Mar 19, 2025, 1:30 PM
64
points
14
comments
15
min read
LW
link
(thezvi.wordpress.com)
Equations Mean Things
abstractapplic
Mar 19, 2025, 8:16 AM
46
points
10
comments
3
min read
LW
link
Elite Coordination via the Consensus of Power
Richard_Ngo
Mar 19, 2025, 6:56 AM
92
points
15
comments
12
min read
LW
link
(www.mindthefuture.info)
What I am working on right now and why: representation engineering edition
Lukasz G Bartoszcze
Mar 18, 2025, 10:37 PM
3
points
0
comments
3
min read
LW
link
Boots theory and Sybil Ramkin
philh
Mar 18, 2025, 10:10 PM
37
points
17
comments
11
min read
LW
link
(reasonableapproximation.net)
Schmidt Sciences Technical AI Safety RFP on Inference-Time Compute – Deadline: April 30
Ryan Gajarawala
Mar 18, 2025, 6:05 PM
18
points
0
comments
2
min read
LW
link
(www.schmidtsciences.org)
PRISM: Perspective Reasoning for Integrated Synthesis and Mediation (Interactive Demo)
Anthony Diamond
Mar 18, 2025, 6:03 PM
10
points
2
comments
1
min read
LW
link
Subspace Rerouting: Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Le magicien quantique
Mar 18, 2025, 5:55 PM
6
points
1
comment
10
min read
LW
link
Progress links and short notes, 2025-03-18
jasoncrawford
Mar 18, 2025, 5:14 PM
8
points
0
comments
3
min read
LW
link
(newsletter.rootsofprogress.org)
The Convergent Path to the Stars
Maxime Riché
Mar 18, 2025, 5:09 PM
6
points
0
comments
20
min read
LW
link
Sapir-Whorf Ego Death
Jonathan Moregård
Mar 18, 2025, 4:57 PM
8
points
7
comments
2
min read
LW
link
(honestliving.substack.com)
Smelling Nice is Good, Actually
Gordon Seidoh Worley
Mar 18, 2025, 4:54 PM
28
points
8
comments
3
min read
LW
link
(uncertainupdates.substack.com)
A Taxonomy of Jobs Deeply Resistant to TAI Automation
Deric Cheng
Mar 18, 2025, 4:25 PM
9
points
0
comments
12
min read
LW
link
(www.convergenceanalysis.org)
Why Are The Human Sciences Hard? Two New Hypotheses
Aydin Mohseni
,
Daniel Herrmann
and
ben_levinstein
Mar 18, 2025, 3:45 PM
39
points
14
comments
9
min read
LW
link
Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions
Stuart_Armstrong
and
rgorman
Mar 18, 2025, 2:48 PM
80
points
12
comments
5
min read
LW
link
[Question]
What is the theory of change behind writing papers about AI safety?
Kajus
Mar 18, 2025, 12:51 PM
7
points
1
comment
1
min read
LW
link
OpenAI #11: America Action Plan
Zvi
Mar 18, 2025, 12:50 PM
83
points
3
comments
6
min read
LW
link
(thezvi.wordpress.com)
I changed my mind about orca intelligence
Towards_Keeperhood
Mar 18, 2025, 10:15 AM
46
points
24
comments
5
min read
LW
link
[Question]
Is Peano arithmetic trying to kill us? Do we care?
Q Home
Mar 18, 2025, 8:22 AM
17
points
2
comments
2
min read
LW
link
Do What the Mammals Do
CrimsonChin
Mar 18, 2025, 3:57 AM
2
points
6
comments
4
min read
LW
link
What Actually Matters Until We Reach the Singularity
Lexius
Mar 18, 2025, 2:17 AM
−1
points
0
comments
9
min read
LW
link
Meaning as a cognitive substitute for survival instincts: A thought experiment
Ovidijus Šimkus
Mar 18, 2025, 1:53 AM
0
points
0
comments
2
min read
LW
link
Against Yudkowsky’s evolution analogy for AI x-risk [unfinished]
Fiora Sunshine
Mar 18, 2025, 1:41 AM
50
points
18
comments
11
min read
LW
link
An “AI researcher” has written a paper on optimizing AI architecture and optimized a language model to several orders of magnitude more efficiency.
Y B
Mar 18, 2025, 1:15 AM
3
points
1
comment
1
min read
LW
link
LessOnline 2025: Early Bird Tickets On Sale
Ben Pace
Mar 18, 2025, 12:22 AM
37
points
5
comments
5
min read
LW
link
Feedback loops for exercise (VO2Max)
Elizabeth
Mar 18, 2025, 12:10 AM
63
points
12
comments
8
min read
LW
link
(acesounderglass.com)
FrontierMath Score of o3-mini Much Lower Than Claimed
YafahEdelman
Mar 17, 2025, 10:41 PM
61
points
7
comments
1
min read
LW
link
Proof-of-Concept Debugger for a Small LLM
Peter Lai
and
StefanHex
Mar 17, 2025, 10:27 PM
27
points
0
comments
11
min read
LW
link
Effectively Communicating with DC Policymakers
PolicyTakes
Mar 17, 2025, 10:11 PM
14
points
0
comments
2
min read
LW
link
Mind the Gap
Bridgett Kay
Mar 17, 2025, 9:59 PM
8
points
0
comments
5
min read
LW
link
(dxmrevealed.wordpress.com)
EIS XV: A New Proof of Concept for Useful Interpretability
scasper
Mar 17, 2025, 8:05 PM
30
points
2
comments
3
min read
LW
link
Sentinel’s Global Risks Weekly Roundup #11/2025. Trump invokes Alien Enemies Act, Chinese invasion barges deployed in exercise.
NunoSempere
Mar 17, 2025, 7:34 PM
59
points
3
comments
6
min read
LW
link
(blog.sentinel-team.org)
Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations
Nicholas Goldowsky-Dill
,
Mikita Balesni
,
Jérémy Scheurer
and
Marius Hobbhahn
17 Mar 2025 19:11 UTC
182
points
9
comments
6
min read
LW
link
Things Look Bleak for White-Collar Jobs Due to AI Acceleration
Declan Molony
17 Mar 2025 17:03 UTC
15
points
0
comments
10
min read
LW
link
Three Types of Intelligence Explosion
rosehadshar
,
Tom Davidson
and
wdmacaskill
17 Mar 2025 14:47 UTC
39
points
8
comments
3
min read
LW
link
(www.forethought.org)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel