Four places where you can put LLM monitoring

9 Aug 2025 23:10 UTC
48 points
0 comments7 min readLW link

Out­put and CoE Mon­i­tor­ing of Cus­tomer Ser­vice Rep­re­sen­ta­tives Shows De­fault Alignment

Brendan Long9 Aug 2025 21:31 UTC
21 points
0 comments1 min readLW link

Live by the Claude, Die by the Claude

Brendan McCord9 Aug 2025 20:23 UTC
0 points
3 comments7 min readLW link
(blog.cosmos-institute.org)

GPT-5 vs AI Alignment

Donatas Lučiūnas9 Aug 2025 20:05 UTC
−8 points
2 comments1 min readLW link

Saidi, My Friend—what do we owe to each other?

James Stephen Brown9 Aug 2025 19:41 UTC
10 points
0 comments5 min readLW link

Самовопрошание

Vadim Golub9 Aug 2025 19:18 UTC
−6 points
0 comments1 min readLW link

Test­ing the Author­i­tar­ian Bias of LLMs

9 Aug 2025 18:09 UTC
9 points
1 comment6 min readLW link

Work­ing with AI: Mea­sur­ing the Oc­cu­pa­tional Im­pli­ca­tions of Gen­er­a­tive AI

Annapurna9 Aug 2025 16:20 UTC
5 points
0 comments1 min readLW link
(jorgevelez.substack.com)

If worker coops are so pro­duc­tive, why aren’t they ev­ery­where?

B Jacobs9 Aug 2025 14:47 UTC
35 points
19 comments4 min readLW link
(bobjacobs.substack.com)

Steganog­ra­phy via in­ter­nal ac­ti­va­tions is already pos­si­ble in small lan­guage mod­els — a po­ten­tial first step to­ward per­sis­tent hid­den rea­son­ing.

9 Aug 2025 11:44 UTC
7 points
0 comments12 min readLW link

Against func­tion­al­ism: a self dialogue

Algon9 Aug 2025 11:19 UTC
13 points
9 comments1 min readLW link

With the Fu­ture of the World in Your Hands, Think for 6.77 Years!

Dawn Drescher9 Aug 2025 10:44 UTC
1 point
0 comments10 min readLW link
(impartial-priorities.org)

Poll on De/​Ac­cel­er­at­ing AI

denkenberger9 Aug 2025 7:13 UTC
13 points
38 comments1 min readLW link

[Event] Build­ing What the Fu­ture Needs: A cu­rated con­fer­ence in Ber­lin (Sep 6, 2025) for high-im­pact builders and researchers

Vasilii Kondyrev8 Aug 2025 23:08 UTC
7 points
0 comments2 min readLW link

Me­mory De­cod­ing Jour­nal Club: The den­dritic engram

Devin Ward8 Aug 2025 22:08 UTC
1 point
0 comments1 min readLW link

Mak­ing Sense of Con­scious­ness Part 4: States of Consciousness

sarahconstantin8 Aug 2025 21:21 UTC
8 points
0 comments5 min readLW link
(sarahconstantin.substack.com)

What would a hu­man pre­tend­ing to be an AI say?

Brendan Long8 Aug 2025 18:56 UTC
53 points
18 comments1 min readLW link
(www.brendanlong.com)

Will morally mo­ti­vated ac­tors steer us to­wards a near-best fu­ture?

wdmacaskill8 Aug 2025 18:32 UTC
22 points
0 comments4 min readLW link

How hard to achieve is eu­topia?

wdmacaskill8 Aug 2025 16:16 UTC
22 points
0 comments7 min readLW link

OpenAI’s GPT-OSS Is Already Old News

Zvi8 Aug 2025 12:20 UTC
39 points
4 comments18 min readLW link
(thezvi.wordpress.com)

Ex­tract-and-Eval­u­ate Mon­i­tor­ing Can Sig­nifi­cantly En­hance CoT Mon­i­tor Perfor­mance (Re­search Note)

8 Aug 2025 10:41 UTC
51 points
7 comments10 min readLW link

The Tor­toise and the Lan­guage Model (A Fable After Hofs­tadter)

mwatkins8 Aug 2025 10:39 UTC
54 points
4 comments3 min readLW link

Closed Mouth, Open Oppurtunities

CstineSublime8 Aug 2025 10:32 UTC
6 points
0 comments4 min readLW link

How an­ti­ci­pa­tory cover-ups go wrong

Kaj_Sotala8 Aug 2025 10:26 UTC
295 points
25 comments6 min readLW link

Strate­gic Moder­a­tion Goals (a Plan B to AI al­ign­ment)

Jim Buhler8 Aug 2025 8:08 UTC
2 points
0 comments3 min readLW link

METR’s Eval­u­a­tion of GPT-5

GradientDissenter7 Aug 2025 22:17 UTC
141 points
15 comments20 min readLW link
(metr.github.io)

ChatGPT is the Da­guerreo­type of AI

Alex_Altair7 Aug 2025 22:14 UTC
42 points
2 comments7 min readLW link

Prin­ci­ples of AI Uncontrollability

WillPetillo7 Aug 2025 21:10 UTC
1 point
0 comments7 min readLW link

Third-or­der cog­ni­tion as a model of su­per­in­tel­li­gence (iron­i­cally: Meta® metacog­ni­tion)

soycarts7 Aug 2025 20:56 UTC
2 points
5 comments13 min readLW link

Yes, Ra­tion­al­ism is a Cult

James Camacho7 Aug 2025 20:43 UTC
−14 points
23 comments4 min readLW link

GPT-5 is out

david reinstein7 Aug 2025 20:33 UTC
4 points
0 comments1 min readLW link
(openai.com)

OpenAI Re­leases GPT-5

anaguma7 Aug 2025 18:41 UTC
18 points
0 comments1 min readLW link
(openai.com)

Balanc­ing ex­plo­ra­tion and re­sis­tance to memetic threats af­ter AGI

Eric Neyman7 Aug 2025 18:03 UTC
26 points
5 comments5 min readLW link

state of the machine

thiccythot7 Aug 2025 17:50 UTC
21 points
5 comments6 min readLW link

Chron­i­cles of the Gen­tle Sin­gu­lar­ity: A Short Story

Ihor Kendiukhov7 Aug 2025 13:50 UTC
21 points
0 comments4 min readLW link

AI #128: Four Hours Un­til Prob­a­bly Not The Apocalypse

Zvi7 Aug 2025 13:00 UTC
34 points
5 comments65 min readLW link
(thezvi.wordpress.com)

No One is Really Working

Annapurna7 Aug 2025 11:19 UTC
5 points
9 comments1 min readLW link
(www.humaninvariant.com)

[Question] An­thropic Is Go­ing All In On Abil­ity Without In­tel­li­gence?

Chapin Lenthall-Cleary7 Aug 2025 5:54 UTC
2 points
0 comments2 min readLW link

Civil Ser­vice: a Vic­tim or a Villain?

Martin Sustrik7 Aug 2025 5:50 UTC
67 points
27 comments4 min readLW link
(www.250bpm.com)

AXRP Epi­sode 46 - Tom David­son on AI-en­abled Coups

DanielFilan7 Aug 2025 5:10 UTC
11 points
0 comments68 min readLW link

A Cheeky Pint with An­thropic CEO Dario Amodei

WilliamKiely7 Aug 2025 3:21 UTC
10 points
3 comments1 min readLW link

Re­pro­duc­ing Ab­solute Zero

Lucy Wingard7 Aug 2025 3:01 UTC
5 points
1 comment4 min readLW link

In­ter­view with Kel­sey Piper on Self-Cen­sor­ship and the Vibe Shift

Zack_M_Davis7 Aug 2025 2:51 UTC
57 points
1 comment15 min readLW link
(unremediatedgender.space)

Forbes: Fear Of Su­per In­tel­li­gent AI Is Driv­ing Har­vard And MIT Stu­dents To Drop Out

Nikola Jurkovic7 Aug 2025 2:02 UTC
19 points
0 comments1 min readLW link
(www.forbes.com)

Open weights != Open source

martinkunev7 Aug 2025 1:04 UTC
0 points
8 comments3 min readLW link

No, Ra­tion­al­ism Is Not a Cult

Liam Robins7 Aug 2025 0:39 UTC
22 points
18 comments10 min readLW link
(thelimestack.substack.com)

Cri­tiquing the Dun­ning-Kruger Effect

Jennifer Young7 Aug 2025 0:36 UTC
0 points
0 comments1 min readLW link

Re: re­cent An­thropic safety research

Eliezer Yudkowsky6 Aug 2025 22:52 UTC
145 points
22 comments5 min readLW link
(x.com)

It’s Owl in the Num­bers: To­ken En­tan­gle­ment in Sublimi­nal Learning

6 Aug 2025 22:18 UTC
38 points
7 comments4 min readLW link

[Question] In­scrutabil­ity was always in­evitable, right?

Steven Byrnes6 Aug 2025 21:57 UTC
99 points
33 comments2 min readLW link