Liv Bo­eree—non-zero hero

James Stephen Brown13 Jul 2025 23:49 UTC
1 point
0 comments2 min readLW link
(nonzerosum.games)

Moloch’s Demise—solv­ing the origi­nal problem

James Stephen Brown13 Jul 2025 23:29 UTC
9 points
8 comments1 min readLW link
(nonzerosum.games)

4 Ways Moloch is Ruin­ing Your Life!—a lis­ti­cle that shows Moloch is all around us, even in listicles

James Stephen Brown13 Jul 2025 23:27 UTC
5 points
0 comments2 min readLW link
(nonzerosum.games)

Three Miss­ing Cakes, or One Tur­bu­lent Critic?

Benquo13 Jul 2025 23:08 UTC
25 points
21 comments3 min readLW link

O(1) rea­son­ing in la­tent space: 1ms in­fer­ence, 77% ac­cu­racy, no at­ten­tion or tokens

Founder Order One13 Jul 2025 22:54 UTC
−11 points
9 comments2 min readLW link

On ac­tu­ally tak­ing ex­pres­sions liter­ally: ten­sion as the key to med­i­ta­tion?

Chris_Leong13 Jul 2025 22:49 UTC
16 points
12 comments5 min readLW link

[Question] Why is LW not about win­ning?

azergante13 Jul 2025 22:36 UTC
21 points
21 comments1 min readLW link

LLMs are stuck in Plato’s cave

Sean Herrington13 Jul 2025 20:37 UTC
7 points
3 comments6 min readLW link

Do LLMs know what they’re ca­pa­ble of? Why this mat­ters for AI safety, and ini­tial findings

13 Jul 2025 19:54 UTC
51 points
5 comments18 min readLW link

10x more train­ing com­pute = 5x greater task length (kind of)

Expertium13 Jul 2025 18:40 UTC
48 points
8 comments2 min readLW link

How Fast is Al­gorith­mic Progress in AI In­fer­ence?

13 Jul 2025 18:26 UTC
6 points
4 comments7 min readLW link

xAI’s Grok 4 has no mean­ingful safety guardrails

eleventhsavi0r13 Jul 2025 18:22 UTC
84 points
15 comments6 min readLW link

You can get LLMs to say al­most any­thing you want

Kaj_Sotala13 Jul 2025 16:30 UTC
80 points
10 comments14 min readLW link

The Fear

NicholasKees13 Jul 2025 16:20 UTC
29 points
1 comment5 min readLW link

Effi­ciently De­tect­ing Hid­den Rea­son­ing with a Small Pre­dic­tor Model

13 Jul 2025 16:04 UTC
33 points
3 comments16 min readLW link

Map­ping the off-tar­get effects of ev­ery FDA-ap­proved drug in ex­is­tence

Abhishaike Mahajan13 Jul 2025 15:21 UTC
24 points
1 comment20 min readLW link
(www.owlposting.com)

Me­mory De­cod­ing Jour­nal Club: Bi­nary and ana­log vari­a­tion of synapses be­tween cor­ti­cal pyra­mi­dal neu­rons

Devin Ward13 Jul 2025 4:00 UTC
2 points
0 comments1 min readLW link

against that one ra­tio­nal­ist mashal about japanese fifth-columnists

Fraser13 Jul 2025 1:42 UTC
72 points
6 comments3 min readLW link
(frvser.com)

Win-Win-Win Ethics—Rec­on­cil­ing Con­se­quen­tial­ism, Virtue Ethics and Deontology

James Stephen Brown13 Jul 2025 1:42 UTC
7 points
2 comments5 min readLW link
(nonzerosum.games)

Why do LLMs hal­lu­ci­nate?

Nina Panickssery13 Jul 2025 0:09 UTC
24 points
1 comment5 min readLW link
(ninapanickssery.substack.com)

Sur­prises and learn­ings from al­most two months of Leo Panickssery

Nina Panickssery12 Jul 2025 23:33 UTC
210 points
12 comments6 min readLW link
(ninapanickssery.substack.com)

Stop and check! The parable of the prince and the dog

Dumbledore's Army12 Jul 2025 17:45 UTC
36 points
0 comments2 min readLW link

Take Pre­cau­tion­ary Mea­sures Against Su­per­hu­man AI Persuasion

Yitz12 Jul 2025 5:34 UTC
14 points
9 comments2 min readLW link

Vi­talik’s Re­sponse to AI 2027

Daniel Kokotajlo11 Jul 2025 21:43 UTC
116 points
53 comments12 min readLW link
(vitalik.eth.limo)

the jack­pot age

thiccythot11 Jul 2025 21:05 UTC
263 points
17 comments4 min readLW link

OpenAI Model Differ­en­ti­a­tion 101

Zvi11 Jul 2025 20:30 UTC
31 points
5 comments11 min readLW link
(thezvi.wordpress.com)

ACX Cape Town

teegs11 Jul 2025 18:46 UTC
1 point
0 comments1 min readLW link

Ad­ding noise to a sand­bag­ging model can re­veal its true capabilities

TheManxLoiner11 Jul 2025 16:56 UTC
16 points
1 comment6 min readLW link

Reflec­tions on AI Com­pan­ion­ship and Ra­tional Vuln­er­a­bil­ity (Or, how I al­most fell in love with an anime Cat­girl LLM).

Noah Weinberger11 Jul 2025 16:12 UTC
11 points
2 comments8 min readLW link

The Per­ils of Op­ti­miz­ing Learned Re­ward Functions

Lukas Fluri11 Jul 2025 16:06 UTC
17 points
1 comment21 min readLW link

Every Uni­verse Thinks It’s the Realest One

Commander Zander11 Jul 2025 15:45 UTC
15 points
1 comment4 min readLW link

On open-sci­ence re­search labs on dis­cord, and get­ting more peo­ple.

Seon Gunness11 Jul 2025 12:05 UTC
9 points
2 comments34 min readLW link

Me­mory De­cod­ing Jour­nal Club: Bi­nary and ana­log vari­a­tion of synapses be­tween cor­ti­cal pyra­mi­dal neurons

Devin Ward11 Jul 2025 4:47 UTC
1 point
0 comments1 min readLW link

De­con­fus­ing ‘AI’ and ‘evolu­tion’

Remmelt11 Jul 2025 1:44 UTC
12 points
9 comments26 min readLW link

So You Think You’ve Awo­ken ChatGPT

JustisMills11 Jul 2025 1:01 UTC
310 points
87 comments9 min readLW link

Mea­sur­ing the Im­pact of Early-2025 AI on Ex­pe­rienced Open-Source Devel­oper Productivity

habryka11 Jul 2025 0:23 UTC
97 points
43 comments6 min readLW link
(metr.org)

On think­ing about AI risks concretely

zeshen11 Jul 2025 0:04 UTC
9 points
4 comments4 min readLW link

Me­tacog­ni­tion and Self-Model­ing in LLMs

Christopher Ackerman10 Jul 2025 21:25 UTC
19 points
2 comments16 min readLW link

My take on AI Align­ment: Cor­po­rate mis­al­ign­ment and DAOs

act6510 Jul 2025 20:33 UTC
7 points
3 comments1 min readLW link

what makes Claude 3 Opus misaligned

janus10 Jul 2025 20:06 UTC
104 points
11 comments5 min readLW link

The Ris­ing Premium of Life, Or: How We Learned to Start Wor­ry­ing and Fear Everything

Linch10 Jul 2025 19:12 UTC
10 points
10 comments1 min readLW link
(linch.substack.com)

Les­sons from the Iraq War for AI policy

Buck10 Jul 2025 18:52 UTC
190 points
25 comments4 min readLW link

Linkpost: Red­wood Re­search read­ing list

Julian Stastny10 Jul 2025 18:39 UTC
50 points
0 comments1 min readLW link
(redwoodresearch.substack.com)

Gen­er­al­ized Han­gri­ness: A Stan­dard Ra­tion­al­ist Stance Toward Emotions

johnswentworth10 Jul 2025 18:22 UTC
359 points
69 comments7 min readLW link

The bit­ter les­son of mi­suse detection

10 Jul 2025 14:50 UTC
37 points
6 comments7 min readLW link

Eval­u­at­ing and mon­i­tor­ing for AI scheming

10 Jul 2025 14:24 UTC
52 points
9 comments5 min readLW link
(deepmindsafetyresearch.medium.com)

White Box Con­trol at UK AISI—Up­date on Sand­bag­ging Investigations

10 Jul 2025 13:37 UTC
78 points
10 comments18 min readLW link

AI #124: Grokless Interlude

Zvi10 Jul 2025 12:40 UTC
28 points
5 comments43 min readLW link
(thezvi.wordpress.com)

How many OOMs of com­pute span the hu­man range?

tickybob10 Jul 2025 11:51 UTC
12 points
6 comments1 min readLW link

The anti-Kar­da­shev scale is a bet­ter mea­sure of civ­i­liza­tional power

RussellThor10 Jul 2025 10:02 UTC
5 points
2 comments3 min readLW link