All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

AllJan Feb Mar Apr May

Tell me about yourself: LLMs are aware of their learned behaviors

Martín Soto and Owain_Evans

Jan 22, 2025, 12:47 AM

130 points

5 comments6 min readLW link

Building AI Research Fleets

Ben Goldhaber and Jesse Hoogland

Jan 12, 2025, 6:23 PM

130 points

11 comments5 min readLW link

Some articles in “International Security” that I enjoyed

BuckJan 31, 2025, 4:23 PM

130 points

10 comments4 min readLW link

The Paris AI Anti-Safety Summit

ZviFeb 12, 2025, 2:00 PM

129 points

21 comments21 min readLW link

(thezvi.wordpress.com)

Gradual Disempowerment, Shell Games and Flinches

Jan_KulveitFeb 2, 2025, 2:47 PM

129 points

36 comments6 min readLW link

The Pando Problem: Rethinking AI Individuality

Jan_KulveitMar 28, 2025, 9:03 PM

128 points

14 comments13 min readLW link

AI-enabled coups: a small group could use AI to seize power

Tom Davidson, Lukas Finnveden and rosehadshar

Apr 16, 2025, 4:51 PM

128 points

18 comments7 min readLW link

Parkinson’s Law and the Ideology of Statistics

BenquoJan 4, 2025, 3:49 PM

127 points

7 comments8 min readLW link

(benjaminrosshoffman.com)

The Intelligence Curse

lukedragoJan 3, 2025, 7:07 PM

126 points

27 comments18 min readLW link

(lukedrago.substack.com)

Do models say what they learn?

Andy Arditi, marvinli, Joe Benton and Miles Turpin

Mar 22, 2025, 3:19 PM

126 points

12 comments13 min readLW link

Meditations on Doge

Martin SustrikMay 25, 2025, 12:00 PM

125 points

42 comments9 min readLW link

(250bpm.substack.com)

Anthropic, and taking “technical philosophy” more seriously

RaemonMar 13, 2025, 1:48 AM

125 points

29 comments11 min readLW link

Social Anxiety Isn’t About Being Liked

ChipmonkMay 16, 2025, 10:26 PM

124 points

21 comments2 min readLW link

(chrislakin.blog)

[Question] when will LLMs become human-level bloggers?

nostalgebraistMar 9, 2025, 9:10 PM

124 points

34 comments6 min readLW link

AI 2027 is a Bet Against Amdahl’s Law

snewmanApr 21, 2025, 3:09 AM

124 points

56 comments9 min readLW link

Five Hinge‑Questions That Decide Whether AGI Is Five Years Away or Twenty

charlieoneillMay 6, 2025, 2:48 AM

124 points

17 comments5 min readLW link

How I’ve run major projects

benkuhnMar 16, 2025, 6:40 PM

123 points

10 comments8 min readLW link

(www.benkuhn.net)

Obstacles in ARC’s agenda: Finding explanations

David MatolcsiApr 30, 2025, 11:03 PM

122 points

10 comments17 min readLW link

Ctrl-Z: Controlling AI Agents via Resampling

Aryan Bhatt, Buck, Adam Kaufman , Cody Rushing and Tyler Tracy

Apr 16, 2025, 4:21 PM

122 points

0 comments20 min readLW link

Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases

Fabien RogerMar 11, 2025, 11:52 AM

121 points

23 comments11 min readLW link

(alignment.anthropic.com)

Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red

Julian BradshawApr 21, 2025, 3:52 AM

121 points

20 comments14 min readLW link

It’s hard to make scheming evals look realistic for LLMs

Igor Ivanov and Danil Kadochnikov

May 24, 2025, 7:17 PM

120 points

20 comments5 min readLW link

2024 in AI predictions

jessicataJan 1, 2025, 8:29 PM

117 points

3 comments8 min readLW link

Research directions Open Phil wants to fund in technical AI safety

jake_mendel, maxnadeau and Peter Favaloro

Feb 8, 2025, 1:40 AM

117 points

21 comments58 min readLW link

(www.openphilanthropy.org)

Three Months In, Evaluating Three Rationalist Cases for Trump

Arjun PanicksseryApr 18, 2025, 8:27 AM

115 points

32 comments4 min readLW link

“The Era of Experience” has an unsolved technical alignment problem

Steven ByrnesApr 24, 2025, 1:57 PM

114 points

48 comments23 min readLW link

Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)

lewis smith, Senthooran Rajamanoharan, Arthur Conmy, CallumMcDougall, Tom Lieberum, János Kramár, Rohin Shah and Neel Nanda

Mar 26, 2025, 7:07 PM

113 points

15 comments29 min readLW link

(deepmindsafetyresearch.medium.com)

The Game Board has been Flipped: Now is a good time to rethink what you’re doing

LintzAJan 28, 2025, 11:36 PM

112 points

30 comments13 min readLW link

Downstream applications as validation of interpretability progress

Sam MarksMar 31, 2025, 1:35 AM

112 points

3 comments7 min readLW link

The News is Never Neglected

lsusrFeb 11, 2025, 2:59 PM

112 points

18 comments1 min readLW link

Open Philanthropy Technical AI Safety RFP - $40M Available Across 21 Research Areas

jake_mendel, maxnadeau and Peter Favaloro

Feb 6, 2025, 6:58 PM

111 points

0 comments1 min readLW link

(www.openphilanthropy.org)

We should try to automate AI safety work asap

Marius HobbhahnApr 26, 2025, 4:35 PM

111 points

10 comments15 min readLW link

Please Donate to CAIP (Post 1 of 6 on AI Governance)

Mass_DriverMay 7, 2025, 5:13 PM

111 points

20 comments33 min readLW link

You can just wear a suit

lsusrFeb 26, 2025, 2:57 PM

111 points

48 comments2 min readLW link

One Year in DC

tlevinMay 19, 2025, 7:46 PM

110 points

5 comments LW link

(www.greentape.pub)

Among Us: A Sandbox for Agentic Deception

7vik and Adrià Garriga-alonso

Apr 5, 2025, 6:24 AM

110 points

7 comments7 min readLW link

New Cause Area Proposal

CallumMcDougallApr 1, 2025, 7:12 AM

109 points

4 comments1 min readLW link

UK AISI’s Alignment Team: Research Agenda

Benjamin Hilton, Jacob Pfau, Marie_DB and Geoffrey Irving

May 7, 2025, 4:33 PM

109 points

2 comments11 min readLW link

Thread for Sense-Making on Recent Murders and How to Sanely Respond

Ben PaceJan 31, 2025, 3:45 AM

109 points

146 comments2 min readLW link

2024 Unofficial LessWrong Survey Results

ScrewtapeMar 14, 2025, 10:29 PM

109 points

28 comments48 min readLW link

Aristocracy and Hostage Capital

Arjun PanicksseryJan 8, 2025, 7:38 PM

108 points

7 comments3 min readLW link

(arjunpanickssery.substack.com)

What OpenAI Told California’s Attorney General

garrisonMay 17, 2025, 11:14 PM

108 points

3 comments LW link

(www.obsolete.pub)

Fake thinking and real thinking

Joe CarlsmithJan 28, 2025, 8:05 PM

108 points

13 comments38 min readLW link

Two hemispheres—I do not think it means what you think it means

ViliamFeb 9, 2025, 3:33 PM

108 points

21 comments14 min readLW link

Notes on the Long Tasks METR paper, from a HCAST task contributor

abstractapplicMay 4, 2025, 11:17 PM

108 points

7 comments2 min readLW link

The Lizardman and the Black Hat Bobcat

ScrewtapeApr 6, 2025, 7:02 PM

107 points

15 comments9 min readLW link

How training-gamers might function (and win)

Vivek HebbarApr 11, 2025, 9:26 PM

107 points

5 comments13 min readLW link

Attribution-based parameter decomposition

Lucius Bushnaq, Dan Braun, StefanHex, jake_mendel and Lee Sharkey

Jan 25, 2025, 1:12 PM

107 points

21 comments4 min readLW link

(publications.apolloresearch.ai)

My supervillain origin story

Dmitry VaintrobJan 27, 2025, 12:20 PM

106 points

1 comment5 min readLW link

How do you deal w/ Super Stimuli?

Logan RiggsJan 14, 2025, 3:14 PM

106 points

25 comments3 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer