HPMOR: The (Prob­a­bly) Un­told Lore

25 Jul 2025 18:39 UTC
421 points
156 comments38 min readLW link

Gen­er­al­ized Han­gri­ness: A Stan­dard Ra­tion­al­ist Stance Toward Emotions

johnswentworth10 Jul 2025 18:22 UTC
359 points
69 comments7 min readLW link

Sublimi­nal Learn­ing: LLMs Trans­mit Be­hav­ioral Traits via Hid­den Sig­nals in Data

22 Jul 2025 16:37 UTC
337 points
35 comments4 min readLW link

So You Think You’ve Awo­ken ChatGPT

JustisMills11 Jul 2025 1:01 UTC
310 points
87 comments9 min readLW link

Make More Grayspaces

Duncan Sabien (Inactive)19 Jul 2025 22:22 UTC
296 points
65 comments13 min readLW link

Love stays loved (formerly “Skin”)

Swimmer963 (Miranda Dixon-Luinenburg) 18 Jul 2025 19:17 UTC
271 points
12 comments29 min readLW link

the jack­pot age

thiccythot11 Jul 2025 21:05 UTC
263 points
17 comments4 min readLW link

Shal­low Water is Danger­ous Too

jefftk20 Jul 2025 2:30 UTC
222 points
24 comments2 min readLW link
(www.jefftk.com)

Sur­prises and learn­ings from al­most two months of Leo Panickssery

Nina Panickssery12 Jul 2025 23:33 UTC
210 points
12 comments6 min readLW link
(ninapanickssery.substack.com)

About 30% of Hu­man­ity’s Last Exam chem­istry/​biol­ogy an­swers are likely wrong

bohaska29 Jul 2025 11:59 UTC
208 points
10 comments4 min readLW link
(www.futurehouse.org)

Op­ti­miz­ing The Fi­nal Out­put Can Obfus­cate CoT (Re­search Note)

30 Jul 2025 21:26 UTC
196 points
22 comments6 min readLW link

Les­sons from the Iraq War for AI policy

Buck10 Jul 2025 18:52 UTC
190 points
25 comments4 min readLW link

Race and Gen­der Bias As An Ex­am­ple of Un­faith­ful Chain of Thought in the Wild

2 Jul 2025 16:35 UTC
181 points
25 comments4 min readLW link

Maya’s Escape

Bridgett Kay27 Jul 2025 16:47 UTC
180 points
9 comments11 min readLW link
(dxmrevealed.wordpress.com)

Chain of Thought Mon­i­tora­bil­ity: A New and Frag­ile Op­por­tu­nity for AI Safety

15 Jul 2025 16:23 UTC
166 points
32 comments1 min readLW link
(bit.ly)

Why Do Some Lan­guage Models Fake Align­ment While Others Don’t?

8 Jul 2025 21:49 UTC
158 points
14 comments5 min readLW link
(arxiv.org)

An Opinionated Guide to Us­ing Anki Correctly

Luise8 Jul 2025 20:01 UTC
156 points
58 comments27 min readLW link

“Buckle up bucko, and get ready for mul­ti­ple hard cog­ni­tive steps.”

Raemon5 Jul 2025 1:47 UTC
149 points
26 comments4 min readLW link

On “ChatGPT Psy­chosis” and LLM Sycophancy

jdp23 Jul 2025 1:11 UTC
142 points
28 comments18 min readLW link
(minihf.com)

Shut­down Re­sis­tance in Rea­son­ing Models

6 Jul 2025 0:01 UTC
138 points
14 comments9 min readLW link
(palisaderesearch.org)

Do con­fi­dent short timelines make sense?

15 Jul 2025 3:37 UTC
138 points
76 comments69 min readLW link

Nar­row Misal­ign­ment is Hard, Emer­gent Misal­ign­ment is Easy

14 Jul 2025 21:05 UTC
130 points
23 comments5 min readLW link

Authors Have a Re­spon­si­bil­ity to Com­mu­ni­cate Clearly

TurnTrout1 Jul 2025 15:41 UTC
125 points
29 comments6 min readLW link
(turntrout.com)

“What’s my goal?”

Raemon2 Jul 2025 2:58 UTC
122 points
9 comments2 min readLW link

The Pur­pose of a Sys­tem is what it Rewards

robotelvis26 Jul 2025 22:08 UTC
120 points
16 comments2 min readLW link
(messyprogress.substack.com)

Vi­talik’s Re­sponse to AI 2027

Daniel Kokotajlo11 Jul 2025 21:43 UTC
116 points
53 comments12 min readLW link
(vitalik.eth.limo)

If Any­one Builds It, Every­one Dies: Call for Trans­la­tors (for Sup­ple­men­tary Ma­te­ri­als)

yams21 Jul 2025 22:37 UTC
112 points
12 comments1 min readLW link

Sim­plex Progress Re­port—July 2025

28 Jul 2025 21:58 UTC
107 points
2 comments15 min readLW link

what makes Claude 3 Opus misaligned

janus10 Jul 2025 20:06 UTC
104 points
11 comments5 min readLW link

Cur­ing PMDD with Hair Loss Pills

David Lorell2 Jul 2025 21:35 UTC
102 points
3 comments8 min readLW link

LLMs Can’t See Pix­els or Characters

Brendan Long20 Jul 2025 20:00 UTC
100 points
44 comments4 min readLW link
(www.brendanlong.com)

Video and tran­script of talk on “Can good­ness com­pete?”

Joe Carlsmith17 Jul 2025 17:54 UTC
98 points
19 comments34 min readLW link
(joecarlsmith.substack.com)

Mea­sur­ing the Im­pact of Early-2025 AI on Ex­pe­rienced Open-Source Devel­oper Productivity

habryka11 Jul 2025 0:23 UTC
97 points
43 comments6 min readLW link
(metr.org)

On the func­tional self of LLMs

eggsyntax7 Jul 2025 15:39 UTC
95 points
35 comments8 min readLW link

No, Grok, No

Zvi9 Jul 2025 15:10 UTC
92 points
3 comments17 min readLW link
(thezvi.wordpress.com)

Re­cent Red­wood Re­search pro­ject proposals

14 Jul 2025 22:27 UTC
91 points
0 comments3 min readLW link

‘AI for so­cietal up­lift’ as a path to victory

Raymond Douglas4 Jul 2025 15:32 UTC
85 points
22 comments2 min readLW link

If Any­one Builds It, Every­one Dies: Ad­ver­tise­ment de­sign competition

yams2 Jul 2025 23:14 UTC
85 points
37 comments1 min readLW link
(intelligence.org)

China pro­poses new global AI co­op­er­a­tion organisation

Matrice Jacobine30 Jul 2025 2:50 UTC
84 points
8 comments1 min readLW link
(www.reuters.com)

xAI’s Grok 4 has no mean­ingful safety guardrails

eleventhsavi0r13 Jul 2025 18:22 UTC
84 points
15 comments6 min readLW link

METR: How Does Time Hori­zon Vary Across Do­mains?

14 Jul 2025 16:13 UTC
84 points
8 comments14 min readLW link
(metr.org)

Sub­way Par­ti­cle Levels Aren’t That High

jefftk9 Jul 2025 2:30 UTC
80 points
4 comments1 min readLW link
(www.jefftk.com)

You can get LLMs to say al­most any­thing you want

Kaj_Sotala13 Jul 2025 16:30 UTC
80 points
10 comments14 min readLW link

White Box Con­trol at UK AISI—Up­date on Sand­bag­ging Investigations

10 Jul 2025 13:37 UTC
78 points
10 comments18 min readLW link

Steer­ing Out-of-Distri­bu­tion Gen­er­al­iza­tion with Con­cept Abla­tion Fine-Tuning

23 Jul 2025 14:57 UTC
78 points
3 comments5 min readLW link

OpenAI Claims IMO Gold Medal

Mikhail Samin19 Jul 2025 9:58 UTC
77 points
74 comments1 min readLW link
(x.com)

Re­search Note: Our schem­ing pre­cur­sor evals had limited pre­dic­tive power for our in-con­text schem­ing evals

Marius Hobbhahn3 Jul 2025 15:57 UTC
75 points
0 comments1 min readLW link
(www.apolloresearch.ai)

My Em­pa­thy Is Rarely Kind

johnswentworth30 Jul 2025 3:49 UTC
73 points
229 comments4 min readLW link

against that one ra­tio­nal­ist mashal about japanese fifth-columnists

Fraser13 Jul 2025 1:42 UTC
72 points
6 comments3 min readLW link
(frvser.com)

Directly Try Solv­ing Align­ment for 5 weeks

Kabir Kumar21 Jul 2025 21:51 UTC
71 points
2 comments6 min readLW link
(beta.ai-plans.com)