Thoughts on ex­trap­o­lat­ing time horizons

Nikola Jurkovic11 Aug 2025 22:36 UTC
53 points
7 comments1 min readLW link
(x.com)

CoT May Be Highly In­for­ma­tive De­spite “Un­faith­ful­ness” [METR]

GradientDissenter11 Aug 2025 21:47 UTC
64 points
3 comments24 min readLW link
(metr.org)

16 Con­crete, Am­bi­tious AI Pro­ject Pro­pos­als for Science and Security

Alejandro Acelas11 Aug 2025 20:33 UTC
13 points
0 comments1 min readLW link
(ifp.org)

How Does A Blind Model See The Earth?

henry11 Aug 2025 19:58 UTC
474 points
38 comments7 min readLW link
(outsidetext.substack.com)

How we spent our first two weeks as an in­de­pen­dent AI safety re­search group

11 Aug 2025 19:32 UTC
28 points
0 comments10 min readLW link

The Frus­tra­tions and Per­ils of Nav­i­gat­ing Blind to Rocks

jimmy11 Aug 2025 19:03 UTC
5 points
0 comments7 min readLW link

Nega­tive util­i­tar­i­anism is more in­tu­itive than you think

Nina Panickssery11 Aug 2025 16:13 UTC
13 points
25 comments3 min readLW link
(blog.ninapanickssery.com)

Dwarf Fortress and Claude’s ASCII Art Blindness

Brendan Long11 Aug 2025 16:05 UTC
16 points
1 comment3 min readLW link
(www.brendanlong.com)

Alter­na­tive Models of Superposition

11 Aug 2025 15:52 UTC
15 points
6 comments5 min readLW link

Am­bi­tion, Good and Bad: Green Grow­ing Things and Forgeworthiness

Evenstar11 Aug 2025 15:20 UTC
10 points
0 comments5 min readLW link

ARENA 5.0 Im­pact Report

11 Aug 2025 14:06 UTC
25 points
0 comments20 min readLW link

GPT-5s Are Alive: Ba­sic Facts, Bench­marks and the Model Card

Zvi11 Aug 2025 12:10 UTC
45 points
2 comments25 min readLW link
(thezvi.wordpress.com)

The tra­jec­tory of the fu­ture could soon get set in stone

wdmacaskill11 Aug 2025 11:04 UTC
41 points
2 comments3 min readLW link

Listen­ing Be­fore Speaking

Alice Blair11 Aug 2025 5:23 UTC
15 points
3 comments3 min readLW link

Le­gal Per­son­hood—Bun­dle Theory

Stephen Martin11 Aug 2025 4:32 UTC
3 points
2 comments3 min readLW link

Mea­sur­ing in­tel­li­gence and re­verse-en­g­ineer­ing goals

jessicata11 Aug 2025 2:08 UTC
33 points
10 comments9 min readLW link
(unstableontology.com)

The Ne­ces­sity of Study­ing Emer­gent Ma­chine Ethics Now

Hiroshi Yamakawa11 Aug 2025 0:37 UTC
3 points
0 comments11 min readLW link

Run-time Steer­ing Can Sur­pass Post-Train­ing: Rea­son­ing Task Performance

Tommy Xie10 Aug 2025 23:52 UTC
5 points
2 comments6 min readLW link
(www.tutke.org)

Stur­dier and Lighter Pedalboard

jefftk10 Aug 2025 23:50 UTC
9 points
0 comments2 min readLW link
(www.jefftk.com)

Un­jour­nal eval­u­a­tion of “Towards best prac­tices in AGI safety & gov­er­nance” (2023), quick take

david reinstein10 Aug 2025 22:28 UTC
7 points
2 comments1 min readLW link
(unjournal.pubpub.org)

My Least Liber­tar­ian Opinion: Ban Ex­clu­sivity Deals*

Brendan Long10 Aug 2025 21:41 UTC
78 points
17 comments2 min readLW link
(www.brendanlong.com)

Mo­ti­vated Rea­son­ing as Bias

oleg10 Aug 2025 21:15 UTC
6 points
2 comments3 min readLW link

Me­mory De­cod­ing Jour­nal Club: The den­dritic engram

Devin Ward10 Aug 2025 20:56 UTC
1 point
0 comments1 min readLW link

LLMs play pris­oner’s Dilemma

parthh0110 Aug 2025 20:36 UTC
2 points
0 comments1 min readLW link

Petrov Day: Bre­men (Oct 10)

10 Aug 2025 19:09 UTC
3 points
1 comment1 min readLW link

The Cod­ing The­o­rem — A Link be­tween Com­plex­ity and Probability

Leon Lang10 Aug 2025 15:34 UTC
32 points
4 comments9 min readLW link

AI Safety at the Fron­tier: Paper High­lights, July ’25

gasteigerjo10 Aug 2025 12:49 UTC
7 points
0 comments9 min readLW link
(aisafetyfrontier.substack.com)

From Orag­nized Shelves to Lay­ered Cat­a­logs: Ar­chi­tec­tural Ex­plo­ra­tions for Sparse Au­toen­coders—Cross­coders & Lad­der SAEs Towards Hier­ar­chi­cal Data Structure

Yuxiao10 Aug 2025 10:12 UTC
2 points
0 comments11 min readLW link

Le­gal Per­son­hood for Digi­tal Minds—Introduction

Stephen Martin10 Aug 2025 9:29 UTC
5 points
4 comments2 min readLW link

Break­ing the Cy­cle of Trauma and Tyranny: How Psy­cholog­i­cal Wounds Shape History

Dawn Drescher10 Aug 2025 8:46 UTC
42 points
6 comments12 min readLW link
(impartial-priorities.org)

Hav­ing chil­dren is not the most effec­tive way to im­prove the world. Have them be­cause you want them, not “for im­pact”.

KatWoods10 Aug 2025 6:54 UTC
12 points
2 comments2 min readLW link

A Self-Dialogue on The Value Propo­si­tion of Ro­man­tic Relationships

johnswentworth10 Aug 2025 1:28 UTC
35 points
71 comments8 min readLW link

GPT-5 writ­ing a Sin­gu­lar­ity scenario

Trevor Cappallo10 Aug 2025 0:56 UTC
25 points
7 comments34 min readLW link

[Question] Link­able images in the ed­i­tor?

Brendan Long10 Aug 2025 0:34 UTC
9 points
4 comments1 min readLW link

Four places where you can put LLM monitoring

9 Aug 2025 23:10 UTC
48 points
0 comments7 min readLW link

Out­put and CoE Mon­i­tor­ing of Cus­tomer Ser­vice Rep­re­sen­ta­tives Shows De­fault Alignment

Brendan Long9 Aug 2025 21:31 UTC
21 points
0 comments1 min readLW link

Live by the Claude, Die by the Claude

Brendan McCord9 Aug 2025 20:23 UTC
0 points
3 comments7 min readLW link
(blog.cosmos-institute.org)

GPT-5 vs AI Alignment

Donatas Lučiūnas9 Aug 2025 20:05 UTC
−8 points
2 comments1 min readLW link

Saidi, My Friend—what do we owe to each other?

James Stephen Brown9 Aug 2025 19:41 UTC
10 points
0 comments5 min readLW link

Самовопрошание

Vadim Golub9 Aug 2025 19:18 UTC
−6 points
0 comments1 min readLW link

Test­ing the Author­i­tar­ian Bias of LLMs

9 Aug 2025 18:09 UTC
9 points
1 comment6 min readLW link

Work­ing with AI: Mea­sur­ing the Oc­cu­pa­tional Im­pli­ca­tions of Gen­er­a­tive AI

Annapurna9 Aug 2025 16:20 UTC
5 points
0 comments1 min readLW link
(jorgevelez.substack.com)

If worker coops are so pro­duc­tive, why aren’t they ev­ery­where?

B Jacobs9 Aug 2025 14:47 UTC
35 points
19 comments4 min readLW link
(bobjacobs.substack.com)

Steganog­ra­phy via in­ter­nal ac­ti­va­tions is already pos­si­ble in small lan­guage mod­els — a po­ten­tial first step to­ward per­sis­tent hid­den rea­son­ing.

9 Aug 2025 11:44 UTC
7 points
0 comments12 min readLW link

Against func­tion­al­ism: a self dialogue

Algon9 Aug 2025 11:19 UTC
13 points
9 comments1 min readLW link

With the Fu­ture of the World in Your Hands, Think for 6.77 Years!

Dawn Drescher9 Aug 2025 10:44 UTC
1 point
0 comments10 min readLW link
(impartial-priorities.org)

Poll on De/​Ac­cel­er­at­ing AI

denkenberger9 Aug 2025 7:13 UTC
13 points
38 comments1 min readLW link

[Event] Build­ing What the Fu­ture Needs: A cu­rated con­fer­ence in Ber­lin (Sep 6, 2025) for high-im­pact builders and researchers

Vasilii Kondyrev8 Aug 2025 23:08 UTC
7 points
0 comments2 min readLW link

Me­mory De­cod­ing Jour­nal Club: The den­dritic engram

Devin Ward8 Aug 2025 22:08 UTC
1 point
0 comments1 min readLW link

Mak­ing Sense of Con­scious­ness Part 4: States of Consciousness

sarahconstantin8 Aug 2025 21:21 UTC
8 points
0 comments5 min readLW link
(sarahconstantin.substack.com)