Eliezer’s Un­teach­able Meth­ods of Sanity

Eliezer Yudkowsky7 Dec 2025 2:46 UTC
491 points
147 comments10 min readLW link

Turn­ing 20 in the prob­a­ble pre-apoc­a­lypse

Parv Mahajan21 Dec 2025 10:14 UTC
407 points
65 comments3 min readLW link

6 rea­sons why “al­ign­ment-is-hard” dis­course seems alien to hu­man in­tu­itions, and vice-versa

Steven Byrnes3 Dec 2025 18:37 UTC
357 points
89 comments17 min readLW link

Toss a bit­coin to your Light­cone – LW + Lighthaven’s 2026 fundraiser

habryka13 Dec 2025 19:32 UTC
310 points
129 comments52 min readLW link

Opinionated Takes on Mee­tups Organizing

jenn20 Dec 2025 0:17 UTC
247 points
34 comments9 min readLW link

AI in 2025: gestalt

technicalities7 Dec 2025 21:25 UTC
246 points
44 comments20 min readLW link

How to game the METR plot

shash4220 Dec 2025 13:46 UTC
236 points
29 comments5 min readLW link

Mea­sur­ing no CoT math time hori­zon (sin­gle for­ward pass)

ryan_greenblatt26 Dec 2025 16:37 UTC
212 points
18 comments3 min readLW link

In­sights into Claude Opus 4.5 from Pokémon

Julian Bradshaw9 Dec 2025 16:57 UTC
206 points
24 comments10 min readLW link

How I stopped be­ing sure LLMs are just mak­ing up their in­ter­nal ex­pe­rience (but the topic is still con­fus­ing)

Kaj_Sotala13 Dec 2025 12:38 UTC
198 points
66 comments29 min readLW link

Con­tra­dict my take on OpenPhil’s past AI beliefs

Eliezer Yudkowsky20 Dec 2025 21:15 UTC
194 points
92 comments3 min readLW link

The be­hav­ioral se­lec­tion model for pre­dict­ing AI motivations

4 Dec 2025 18:46 UTC
189 points
27 comments16 min readLW link

Align­ment Pre­train­ing: AI Dis­course Causes Self-Fulfilling (Mis)alignment

21 Dec 2025 0:53 UTC
184 points
23 comments9 min readLW link

Scien­tific break­throughs of the year

technicalities16 Dec 2025 18:00 UTC
178 points
13 comments3 min readLW link
(x.com)

MIRI’s 2025 Fundraiser

alexvermeer2 Dec 2025 1:53 UTC
176 points
7 comments8 min readLW link

Shal­low re­view of tech­ni­cal AI safety, 2025

17 Dec 2025 18:18 UTC
175 points
9 comments83 min readLW link

An Am­bi­tious Vi­sion for Interpretability

leogao5 Dec 2025 22:57 UTC
168 points
7 comments4 min readLW link

Lit­tle Echo

Zvi8 Dec 2025 15:30 UTC
160 points
15 comments2 min readLW link
(thezvi.wordpress.com)

Ac­ti­va­tion Or­a­cles: Train­ing and Eval­u­at­ing LLMs as Gen­eral-Pur­pose Ac­ti­va­tion Explainers

18 Dec 2025 20:21 UTC
153 points
11 comments8 min readLW link
(arxiv.org)

Weird Gen­er­al­iza­tion & In­duc­tive Backdoors

11 Dec 2025 18:18 UTC
152 points
8 comments8 min readLW link

Re­cent LLMs can use filler to­kens or prob­lem re­peats to im­prove (no-CoT) math performance

ryan_greenblatt22 Dec 2025 17:21 UTC
152 points
18 comments7 min readLW link

A high in­tegrity/​epistemics poli­ti­cal coal­i­tion?

Raemon14 Dec 2025 22:21 UTC
148 points
34 comments13 min readLW link

The fund­ing con­ver­sa­tion we left unfinished

jenn10 Dec 2025 2:17 UTC
147 points
3 comments3 min readLW link

Danc­ing in a World of Horseradish

lsusr17 Dec 2025 5:50 UTC
134 points
31 comments4 min readLW link

My AGI safety re­search—2025 re­view, ’26 plans

Steven Byrnes11 Dec 2025 17:05 UTC
133 points
4 comments12 min readLW link

A Prag­matic Vi­sion for Interpretability

1 Dec 2025 13:05 UTC
131 points
39 comments27 min readLW link

I said hello and greeted 1,000 peo­ple at 5am this morning

Declan Molony8 Dec 2025 3:35 UTC
128 points
7 comments2 min readLW link

How mid­dle pow­ers may pre­vent the de­vel­op­ment of ar­tifi­cial superintelligence

1 Dec 2025 16:48 UTC
127 points
12 comments3 min readLW link
(asi-prevention.com)

You Can Just Buy Far-UVC

jefftk13 Dec 2025 13:10 UTC
123 points
26 comments1 min readLW link
(www.jefftk.com)

The CIA Poi­soned My Dog: Two Sto­ries About Para­noid Delu­sions and Da­m­age Control

River29 Dec 2025 3:59 UTC
123 points
2 comments5 min readLW link

Small Models Can In­tro­spect, Too

vgel21 Dec 2025 22:20 UTC
121 points
8 comments4 min readLW link
(vgel.me)

An­nounc­ing: OpenAI’s Align­ment Re­search Blog

Naomi Bashkansky1 Dec 2025 19:52 UTC
120 points
11 comments1 min readLW link

Can Claude teach me to make coffee?

philh21 Dec 2025 16:23 UTC
120 points
19 comments16 min readLW link

Defend­ing Against Model Weight Exfil­tra­tion Through In­fer­ence Verification

15 Dec 2025 15:26 UTC
119 points
15 comments8 min readLW link

We need a field of Re­ward Func­tion Design

Steven Byrnes8 Dec 2025 19:15 UTC
118 points
12 comments5 min readLW link

Scal­able End-to-End Interpretability

jsteinhardt18 Dec 2025 22:37 UTC
117 points
2 comments3 min readLW link

Good if make prior af­ter data in­stead of before

dynomight18 Dec 2025 17:53 UTC
113 points
15 comments9 min readLW link
(dynomight.net)

Technoromanticism

lsusr21 Dec 2025 9:00 UTC
111 points
18 comments5 min readLW link

An­nounc­ing RoastMyPost: LLMs Eval Blog Posts and More

ozziegooen17 Dec 2025 18:10 UTC
110 points
17 comments5 min readLW link

A Case for Model Per­sona Research

15 Dec 2025 13:35 UTC
109 points
8 comments4 min readLW link

Don’t Sell Stock to Donate

jefftk30 Dec 2025 19:50 UTC
109 points
13 comments2 min readLW link
(www.jefftk.com)

What’s go­ing on at CFAR? (Up­dates and Fundraiser)

AnnaSalamon30 Dec 2025 5:00 UTC
108 points
39 comments35 min readLW link

Are We In A Cod­ing Over­hang?

Michaël Trazzi27 Dec 2025 8:16 UTC
107 points
14 comments3 min readLW link

Clip­board Normalization

jefftk25 Dec 2025 13:50 UTC
105 points
9 comments1 min readLW link
(www.jefftk.com)

Help keep AI un­der hu­man con­trol: Pal­isade Re­search 2026 fundraiser

18 Dec 2025 23:41 UTC
105 points
66 comments6 min readLW link

Fol­low-through on Bay Solstice

Raemon10 Dec 2025 22:07 UTC
104 points
22 comments6 min readLW link

Au­dit­ing Games for Sand­bag­ging [pa­per]

9 Dec 2025 18:37 UTC
103 points
4 comments10 min readLW link

[Question] Why does Eliezer make abra­sive pub­lic com­ments?

k6422 Dec 2025 16:45 UTC
96 points
65 comments1 min readLW link

An­nounc­ing Gemma Scope 2

22 Dec 2025 21:56 UTC
94 points
1 comment2 min readLW link

Catch-Up Al­gorith­mic Progress Might Ac­tu­ally be 60× per Year

Aaron_Scher24 Dec 2025 21:03 UTC
92 points
16 comments10 min readLW link