Eliezer’s Un­teach­able Meth­ods of Sanity

Eliezer Yudkowsky7 Dec 2025 2:46 UTC
527 points
156 comments10 min readLW link

Turn­ing 20 in the prob­a­ble pre-apoc­a­lypse

Parv Mahajan21 Dec 2025 10:14 UTC
444 points
66 comments3 min readLW link

6 rea­sons why “al­ign­ment-is-hard” dis­course seems alien to hu­man in­tu­itions, and vice-versa

Steven Byrnes3 Dec 2025 18:37 UTC
375 points
92 comments17 min readLW link

Toss a bit­coin to your Light­cone – LW + Lighthaven’s 2026 fundraiser

habryka13 Dec 2025 19:32 UTC
316 points
129 comments52 min readLW link

Opinionated Takes on Mee­tups Organizing

jenn20 Dec 2025 0:17 UTC
251 points
34 comments9 min readLW link

AI in 2025: gestalt

technicalities7 Dec 2025 21:25 UTC
248 points
44 comments20 min readLW link

How to game the METR plot

shash4220 Dec 2025 13:46 UTC
243 points
32 comments5 min readLW link

In­sights into Claude Opus 4.5 from Pokémon

Julian Bradshaw9 Dec 2025 16:57 UTC
222 points
24 comments10 min readLW link

Mea­sur­ing no CoT math time hori­zon (sin­gle for­ward pass)

ryan_greenblatt26 Dec 2025 16:37 UTC
215 points
18 comments3 min readLW link

The be­hav­ioral se­lec­tion model for pre­dict­ing AI motivations

4 Dec 2025 18:46 UTC
204 points
31 comments16 min readLW link

How I stopped be­ing sure LLMs are just mak­ing up their in­ter­nal ex­pe­rience (but the topic is still con­fus­ing)

Kaj_Sotala13 Dec 2025 12:38 UTC
203 points
71 comments29 min readLW link

Align­ment Pre­train­ing: AI Dis­course Causes Self-Fulfilling (Mis)alignment

21 Dec 2025 0:53 UTC
201 points
25 comments9 min readLW link

Con­tra­dict my take on OpenPhil’s past AI beliefs

Eliezer Yudkowsky20 Dec 2025 21:15 UTC
197 points
94 comments3 min readLW link

Shal­low re­view of tech­ni­cal AI safety, 2025

17 Dec 2025 18:18 UTC
193 points
9 comments47 min readLW link

Scien­tific break­throughs of the year

technicalities16 Dec 2025 18:00 UTC
185 points
13 comments3 min readLW link
(x.com)

MIRI’s 2025 Fundraiser

alexvermeer2 Dec 2025 1:53 UTC
176 points
7 comments8 min readLW link

An Am­bi­tious Vi­sion for Interpretability

leogao5 Dec 2025 22:57 UTC
175 points
8 comments4 min readLW link

Lit­tle Echo

Zvi8 Dec 2025 15:30 UTC
161 points
15 comments2 min readLW link
(thezvi.wordpress.com)

Ac­ti­va­tion Or­a­cles: Train­ing and Eval­u­at­ing LLMs as Gen­eral-Pur­pose Ac­ti­va­tion Explainers

18 Dec 2025 20:21 UTC
154 points
11 comments8 min readLW link
(arxiv.org)

Weird Gen­er­al­iza­tion & In­duc­tive Backdoors

11 Dec 2025 18:18 UTC
153 points
8 comments8 min readLW link

Re­cent LLMs can use filler to­kens or prob­lem re­peats to im­prove (no-CoT) math performance

ryan_greenblatt22 Dec 2025 17:21 UTC
153 points
19 comments7 min readLW link

The fund­ing con­ver­sa­tion we left unfinished

jenn10 Dec 2025 2:17 UTC
151 points
3 comments3 min readLW link

Can Claude teach me to make coffee?

philh21 Dec 2025 16:23 UTC
151 points
25 comments16 min readLW link

A high in­tegrity/​epistemics poli­ti­cal coal­i­tion?

Raemon14 Dec 2025 22:21 UTC
149 points
34 comments13 min readLW link

I said hello and greeted 1,000 peo­ple at 5am this morning

Declan Molony8 Dec 2025 3:35 UTC
142 points
7 comments2 min readLW link

A Prag­matic Vi­sion for Interpretability

1 Dec 2025 13:05 UTC
139 points
39 comments27 min readLW link

My AGI safety re­search—2025 re­view, ’26 plans

Steven Byrnes11 Dec 2025 17:05 UTC
137 points
4 comments12 min readLW link

Danc­ing in a World of Horseradish

lsusr17 Dec 2025 5:50 UTC
136 points
31 comments4 min readLW link

How mid­dle pow­ers may pre­vent the de­vel­op­ment of ar­tifi­cial superintelligence

1 Dec 2025 16:48 UTC
132 points
12 comments3 min readLW link
(asi-prevention.com)

You Can Just Buy Far-UVC

jefftk13 Dec 2025 13:10 UTC
126 points
26 comments1 min readLW link
(www.jefftk.com)

The CIA Poi­soned My Dog: Two Sto­ries About Para­noid Delu­sions and Da­m­age Control

River29 Dec 2025 3:59 UTC
125 points
2 comments5 min readLW link

Small Models Can In­tro­spect, Too

vgel21 Dec 2025 22:20 UTC
124 points
8 comments4 min readLW link
(vgel.me)

A Case for Model Per­sona Research

15 Dec 2025 13:35 UTC
121 points
11 comments4 min readLW link

An­nounc­ing: OpenAI’s Align­ment Re­search Blog

Naomi Bashkansky1 Dec 2025 19:52 UTC
120 points
11 comments1 min readLW link

Scal­able End-to-End Interpretability

jsteinhardt18 Dec 2025 22:37 UTC
120 points
3 comments3 min readLW link

Defend­ing Against Model Weight Exfil­tra­tion Through In­fer­ence Verification

15 Dec 2025 15:26 UTC
120 points
15 comments8 min readLW link

We need a field of Re­ward Func­tion Design

Steven Byrnes8 Dec 2025 19:15 UTC
118 points
12 comments5 min readLW link

Good if make prior af­ter data in­stead of before

dynomight18 Dec 2025 17:53 UTC
117 points
18 comments9 min readLW link
(dynomight.net)

Don’t Sell Stock to Donate

jefftk30 Dec 2025 19:50 UTC
113 points
13 comments2 min readLW link
(www.jefftk.com)

Are We In A Cod­ing Over­hang?

Michaël Trazzi27 Dec 2025 8:16 UTC
110 points
14 comments3 min readLW link

Technoromanticism

lsusr21 Dec 2025 9:00 UTC
110 points
20 comments5 min readLW link

An­nounc­ing RoastMyPost: LLMs Eval Blog Posts and More

ozziegooen17 Dec 2025 18:10 UTC
110 points
17 comments5 min readLW link

What’s go­ing on at CFAR? (Up­dates and Fundraiser)

AnnaSalamon30 Dec 2025 5:00 UTC
110 points
39 comments30 min readLW link

Fol­low-through on Bay Solstice

Raemon10 Dec 2025 22:07 UTC
106 points
22 comments6 min readLW link

Clip­board Normalization

jefftk25 Dec 2025 13:50 UTC
105 points
9 comments1 min readLW link
(www.jefftk.com)

Help keep AI un­der hu­man con­trol: Pal­isade Re­search 2026 fundraiser

18 Dec 2025 23:41 UTC
105 points
66 comments6 min readLW link

Au­dit­ing Games for Sand­bag­ging [pa­per]

9 Dec 2025 18:37 UTC
103 points
4 comments10 min readLW link

[Question] Why does Eliezer make abra­sive pub­lic com­ments?

k6422 Dec 2025 16:45 UTC
97 points
65 comments1 min readLW link

An­nounc­ing Gemma Scope 2

22 Dec 2025 21:56 UTC
96 points
1 comment2 min readLW link

why amer­ica can’t build ships

bhauth6 Dec 2025 0:35 UTC
95 points
18 comments6 min readLW link
(www.bhauth.com)