All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Eliezer’s Unteachable Methods of Sanity

Eliezer Yudkowsky7 Dec 2025 2:46 UTC

491 points

147 comments10 min readLW link

Turning 20 in the probable pre-apocalypse

Parv Mahajan21 Dec 2025 10:14 UTC

407 points

65 comments3 min readLW link

6 reasons why “alignment-is-hard” discourse seems alien to human intuitions, and vice-versa

Steven Byrnes3 Dec 2025 18:37 UTC

357 points

89 comments17 min readLW link

Toss a bitcoin to your Lightcone – LW + Lighthaven’s 2026 fundraiser

habryka13 Dec 2025 19:32 UTC

310 points

129 comments52 min readLW link

Opinionated Takes on Meetups Organizing

jenn20 Dec 2025 0:17 UTC

247 points

34 comments9 min readLW link

AI in 2025: gestalt

technicalities7 Dec 2025 21:25 UTC

246 points

44 comments20 min readLW link

How to game the METR plot

shash4220 Dec 2025 13:46 UTC

236 points

29 comments5 min readLW link

Measuring no CoT math time horizon (single forward pass)

ryan_greenblatt26 Dec 2025 16:37 UTC

212 points

18 comments3 min readLW link

Insights into Claude Opus 4.5 from Pokémon

Julian Bradshaw9 Dec 2025 16:57 UTC

206 points

24 comments10 min readLW link

How I stopped being sure LLMs are just making up their internal experience (but the topic is still confusing)

Kaj_Sotala13 Dec 2025 12:38 UTC

198 points

66 comments29 min readLW link

Contradict my take on OpenPhil’s past AI beliefs

Eliezer Yudkowsky20 Dec 2025 21:15 UTC

194 points

92 comments3 min readLW link

The behavioral selection model for predicting AI motivations

Alex Mallen and Buck

4 Dec 2025 18:46 UTC

189 points

27 comments16 min readLW link

Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment

Cam, Puria Radmard, Kyle O’Brien, David Africa, Samuel Ratnam and andyk

21 Dec 2025 0:53 UTC

184 points

23 comments9 min readLW link

Scientific breakthroughs of the year

technicalities16 Dec 2025 18:00 UTC

178 points

13 comments3 min readLW link

(x.com)

MIRI’s 2025 Fundraiser

alexvermeer2 Dec 2025 1:53 UTC

176 points

7 comments8 min readLW link

Shallow review of technical AI safety, 2025

technicalities, Tomáš Gavenčiak, Stephen McAleese, peligrietzer, Stag, jordine, ozziegooen, Violet Hour and ramennaut

17 Dec 2025 18:18 UTC

175 points

9 comments83 min readLW link

An Ambitious Vision for Interpretability

leogao5 Dec 2025 22:57 UTC

168 points

7 comments4 min readLW link

Little Echo

Zvi8 Dec 2025 15:30 UTC

160 points

15 comments2 min readLW link

(thezvi.wordpress.com)

Activation Oracles: Training and Evaluating LLMs as General-Purpose Activation Explainers

Sam Marks, Adam Karvonen, James Chua, Subhash Kantamneni, Euan Ong, Julian Minder, Clément Dumas and Owain_Evans

18 Dec 2025 20:21 UTC

153 points

11 comments8 min readLW link

(arxiv.org)

Weird Generalization & Inductive Backdoors

Jorio Cocola, Owain_Evans and dylan_f

11 Dec 2025 18:18 UTC

152 points

8 comments8 min readLW link

Recent LLMs can use filler tokens or problem repeats to improve (no-CoT) math performance

ryan_greenblatt22 Dec 2025 17:21 UTC

152 points

18 comments7 min readLW link

A high integrity/epistemics political coalition?

Raemon14 Dec 2025 22:21 UTC

148 points

34 comments13 min readLW link

The funding conversation we left unfinished

jenn10 Dec 2025 2:17 UTC

147 points

3 comments3 min readLW link

Dancing in a World of Horseradish

lsusr17 Dec 2025 5:50 UTC

134 points

31 comments4 min readLW link

My AGI safety research—2025 review, ’26 plans

Steven Byrnes11 Dec 2025 17:05 UTC

133 points

4 comments12 min readLW link

A Pragmatic Vision for Interpretability

Neel Nanda, Josh Engels, Arthur Conmy, Senthooran Rajamanoharan, bilalchughtai, CallumMcDougall, János Kramár and lewis smith

1 Dec 2025 13:05 UTC

131 points

39 comments27 min readLW link

I said hello and greeted 1,000 people at 5am this morning

Declan Molony8 Dec 2025 3:35 UTC

128 points

7 comments2 min readLW link

How middle powers may prevent the development of artificial superintelligence

Alex Amadori, Gabriel Alfour, Andrea_Miotti and Eva_B

1 Dec 2025 16:48 UTC

127 points

12 comments3 min readLW link

(asi-prevention.com)

You Can Just Buy Far-UVC

jefftk13 Dec 2025 13:10 UTC

123 points

26 comments1 min readLW link

(www.jefftk.com)

The CIA Poisoned My Dog: Two Stories About Paranoid Delusions and Damage Control

River29 Dec 2025 3:59 UTC

123 points

2 comments5 min readLW link

Small Models Can Introspect, Too

vgel21 Dec 2025 22:20 UTC

121 points

8 comments4 min readLW link

(vgel.me)

Announcing: OpenAI’s Alignment Research Blog

Naomi Bashkansky1 Dec 2025 19:52 UTC

120 points

11 comments1 min readLW link

Can Claude teach me to make coffee?

philh21 Dec 2025 16:23 UTC

120 points

19 comments16 min readLW link

Defending Against Model Weight Exfiltration Through Inference Verification

Roy Rinberg, Adam Karvonen, dreuter and Keri Warr

15 Dec 2025 15:26 UTC

119 points

15 comments8 min readLW link

We need a field of Reward Function Design

Steven Byrnes8 Dec 2025 19:15 UTC

118 points

12 comments5 min readLW link

Scalable End-to-End Interpretability

jsteinhardt18 Dec 2025 22:37 UTC

117 points

2 comments3 min readLW link

Good if make prior after data instead of before

dynomight18 Dec 2025 17:53 UTC

113 points

15 comments9 min readLW link

(dynomight.net)

Technoromanticism

lsusr21 Dec 2025 9:00 UTC

111 points

18 comments5 min readLW link

Announcing RoastMyPost: LLMs Eval Blog Posts and More

ozziegooen17 Dec 2025 18:10 UTC

110 points

17 comments5 min readLW link

A Case for Model Persona Research

nielsrolf, Maxime Riché and Daniel Tan

15 Dec 2025 13:35 UTC

109 points

8 comments4 min readLW link

Don’t Sell Stock to Donate

jefftk30 Dec 2025 19:50 UTC

109 points

13 comments2 min readLW link

(www.jefftk.com)

What’s going on at CFAR? (Updates and Fundraiser)

AnnaSalamon30 Dec 2025 5:00 UTC

108 points

39 comments35 min readLW link

Are We In A Coding Overhang?

Michaël Trazzi27 Dec 2025 8:16 UTC

107 points

14 comments3 min readLW link

Clipboard Normalization

jefftk25 Dec 2025 13:50 UTC

105 points

9 comments1 min readLW link

(www.jefftk.com)

Help keep AI under human control: Palisade Research 2026 fundraiser

Jeffrey Ladish, benwr, Eli Tyre and John Steidley

18 Dec 2025 23:41 UTC

105 points

66 comments6 min readLW link

Follow-through on Bay Solstice

Raemon10 Dec 2025 22:07 UTC

104 points

22 comments6 min readLW link

Auditing Games for Sandbagging [paper]

Jordan Taylor and Joseph Bloom

9 Dec 2025 18:37 UTC

103 points

4 comments10 min readLW link

[Question] Why does Eliezer make abrasive public comments?

k6422 Dec 2025 16:45 UTC

96 points

65 comments1 min readLW link

Announcing Gemma Scope 2

CallumMcDougall, Arthur Conmy, János Kramár, Tom Lieberum, Senthooran Rajamanoharan and Neel Nanda

22 Dec 2025 21:56 UTC

94 points

1 comment2 min readLW link

Catch-Up Algorithmic Progress Might Actually be 60× per Year

Aaron_Scher24 Dec 2025 21:03 UTC

92 points

16 comments10 min readLW link