Run-time Steer­ing Can Sur­pass Post-Train­ing: Rea­son­ing Task Performance

Tommy Xie10 Aug 2025 23:52 UTC
5 points
2 comments6 min readLW link
(www.tutke.org)

Stur­dier and Lighter Pedalboard

jefftk10 Aug 2025 23:50 UTC
9 points
0 comments2 min readLW link
(www.jefftk.com)

Un­jour­nal eval­u­a­tion of “Towards best prac­tices in AGI safety & gov­er­nance” (2023), quick take

david reinstein10 Aug 2025 22:28 UTC
7 points
2 comments1 min readLW link
(unjournal.pubpub.org)

My Least Liber­tar­ian Opinion: Ban Ex­clu­sivity Deals*

Brendan Long10 Aug 2025 21:41 UTC
78 points
17 comments2 min readLW link
(www.brendanlong.com)

Mo­ti­vated Rea­son­ing as Bias

oleg10 Aug 2025 21:15 UTC
6 points
2 comments3 min readLW link

Me­mory De­cod­ing Jour­nal Club: The den­dritic engram

Devin Ward10 Aug 2025 20:56 UTC
1 point
0 comments1 min readLW link

LLMs play pris­oner’s Dilemma

parthh0110 Aug 2025 20:36 UTC
2 points
0 comments1 min readLW link

Petrov Day: Bre­men (Oct 10)

10 Aug 2025 19:09 UTC
3 points
1 comment1 min readLW link

The Cod­ing The­o­rem — A Link be­tween Com­plex­ity and Probability

Leon Lang10 Aug 2025 15:34 UTC
32 points
4 comments9 min readLW link

AI Safety at the Fron­tier: Paper High­lights, July ’25

gasteigerjo10 Aug 2025 12:49 UTC
7 points
0 comments9 min readLW link
(aisafetyfrontier.substack.com)

From Orag­nized Shelves to Lay­ered Cat­a­logs: Ar­chi­tec­tural Ex­plo­ra­tions for Sparse Au­toen­coders—Cross­coders & Lad­der SAEs Towards Hier­ar­chi­cal Data Structure

Yuxiao10 Aug 2025 10:12 UTC
2 points
0 comments11 min readLW link

Le­gal Per­son­hood for Digi­tal Minds—Introduction

Stephen Martin10 Aug 2025 9:29 UTC
5 points
4 comments2 min readLW link

Break­ing the Cy­cle of Trauma and Tyranny: How Psy­cholog­i­cal Wounds Shape History

Dawn Drescher10 Aug 2025 8:46 UTC
42 points
6 comments12 min readLW link
(impartial-priorities.org)

Hav­ing chil­dren is not the most effec­tive way to im­prove the world. Have them be­cause you want them, not “for im­pact”.

KatWoods10 Aug 2025 6:54 UTC
12 points
2 comments2 min readLW link

A Self-Dialogue on The Value Propo­si­tion of Ro­man­tic Relationships

johnswentworth10 Aug 2025 1:28 UTC
35 points
71 comments8 min readLW link

GPT-5 writ­ing a Sin­gu­lar­ity scenario

Trevor Cappallo10 Aug 2025 0:56 UTC
25 points
7 comments34 min readLW link

[Question] Link­able images in the ed­i­tor?

Brendan Long10 Aug 2025 0:34 UTC
9 points
4 comments1 min readLW link

Four places where you can put LLM monitoring

9 Aug 2025 23:10 UTC
48 points
0 comments7 min readLW link

Out­put and CoE Mon­i­tor­ing of Cus­tomer Ser­vice Rep­re­sen­ta­tives Shows De­fault Alignment

Brendan Long9 Aug 2025 21:31 UTC
21 points
0 comments1 min readLW link

Live by the Claude, Die by the Claude

Brendan McCord9 Aug 2025 20:23 UTC
0 points
3 comments7 min readLW link
(blog.cosmos-institute.org)

GPT-5 vs AI Alignment

Donatas Lučiūnas9 Aug 2025 20:05 UTC
−8 points
2 comments1 min readLW link

Saidi, My Friend—what do we owe to each other?

James Stephen Brown9 Aug 2025 19:41 UTC
10 points
0 comments5 min readLW link

Самовопрошание

Vadim Golub9 Aug 2025 19:18 UTC
−6 points
0 comments1 min readLW link

Test­ing the Author­i­tar­ian Bias of LLMs

9 Aug 2025 18:09 UTC
9 points
1 comment6 min readLW link

Work­ing with AI: Mea­sur­ing the Oc­cu­pa­tional Im­pli­ca­tions of Gen­er­a­tive AI

Annapurna9 Aug 2025 16:20 UTC
5 points
0 comments1 min readLW link
(jorgevelez.substack.com)

If worker coops are so pro­duc­tive, why aren’t they ev­ery­where?

B Jacobs9 Aug 2025 14:47 UTC
35 points
19 comments4 min readLW link
(bobjacobs.substack.com)

Steganog­ra­phy via in­ter­nal ac­ti­va­tions is already pos­si­ble in small lan­guage mod­els — a po­ten­tial first step to­ward per­sis­tent hid­den rea­son­ing.

9 Aug 2025 11:44 UTC
7 points
0 comments12 min readLW link

Against func­tion­al­ism: a self dialogue

Algon9 Aug 2025 11:19 UTC
13 points
9 comments1 min readLW link

With the Fu­ture of the World in Your Hands, Think for 6.77 Years!

Dawn Drescher9 Aug 2025 10:44 UTC
1 point
0 comments10 min readLW link
(impartial-priorities.org)

Poll on De/​Ac­cel­er­at­ing AI

denkenberger9 Aug 2025 7:13 UTC
13 points
38 comments1 min readLW link

[Event] Build­ing What the Fu­ture Needs: A cu­rated con­fer­ence in Ber­lin (Sep 6, 2025) for high-im­pact builders and researchers

Vasilii Kondyrev8 Aug 2025 23:08 UTC
7 points
0 comments2 min readLW link

Me­mory De­cod­ing Jour­nal Club: The den­dritic engram

Devin Ward8 Aug 2025 22:08 UTC
1 point
0 comments1 min readLW link

Mak­ing Sense of Con­scious­ness Part 4: States of Consciousness

sarahconstantin8 Aug 2025 21:21 UTC
8 points
0 comments5 min readLW link
(sarahconstantin.substack.com)

What would a hu­man pre­tend­ing to be an AI say?

Brendan Long8 Aug 2025 18:56 UTC
53 points
18 comments1 min readLW link
(www.brendanlong.com)

Will morally mo­ti­vated ac­tors steer us to­wards a near-best fu­ture?

wdmacaskill8 Aug 2025 18:32 UTC
22 points
0 comments4 min readLW link

How hard to achieve is eu­topia?

wdmacaskill8 Aug 2025 16:16 UTC
22 points
0 comments7 min readLW link

OpenAI’s GPT-OSS Is Already Old News

Zvi8 Aug 2025 12:20 UTC
39 points
4 comments18 min readLW link
(thezvi.wordpress.com)

Ex­tract-and-Eval­u­ate Mon­i­tor­ing Can Sig­nifi­cantly En­hance CoT Mon­i­tor Perfor­mance (Re­search Note)

8 Aug 2025 10:41 UTC
51 points
7 comments10 min readLW link

The Tor­toise and the Lan­guage Model (A Fable After Hofs­tadter)

mwatkins8 Aug 2025 10:39 UTC
54 points
4 comments3 min readLW link

Closed Mouth, Open Oppurtunities

CstineSublime8 Aug 2025 10:32 UTC
6 points
0 comments4 min readLW link

How an­ti­ci­pa­tory cover-ups go wrong

Kaj_Sotala8 Aug 2025 10:26 UTC
295 points
25 comments6 min readLW link

Strate­gic Moder­a­tion Goals (a Plan B to AI al­ign­ment)

Jim Buhler8 Aug 2025 8:08 UTC
2 points
0 comments3 min readLW link

METR’s Eval­u­a­tion of GPT-5

GradientDissenter7 Aug 2025 22:17 UTC
141 points
15 comments20 min readLW link
(metr.github.io)

ChatGPT is the Da­guerreo­type of AI

Alex_Altair7 Aug 2025 22:14 UTC
42 points
2 comments7 min readLW link

Prin­ci­ples of AI Uncontrollability

WillPetillo7 Aug 2025 21:10 UTC
1 point
0 comments7 min readLW link

Third-or­der cog­ni­tion as a model of su­per­in­tel­li­gence (iron­i­cally: Meta® metacog­ni­tion)

soycarts7 Aug 2025 20:56 UTC
2 points
5 comments13 min readLW link

Yes, Ra­tion­al­ism is a Cult

James Camacho7 Aug 2025 20:43 UTC
−14 points
23 comments4 min readLW link

GPT-5 is out

david reinstein7 Aug 2025 20:33 UTC
4 points
0 comments1 min readLW link
(openai.com)

OpenAI Re­leases GPT-5

anaguma7 Aug 2025 18:41 UTC
18 points
0 comments1 min readLW link
(openai.com)

Balanc­ing ex­plo­ra­tion and re­sis­tance to memetic threats af­ter AGI

Eric Neyman7 Aug 2025 18:03 UTC
26 points
5 comments5 min readLW link