An ARENA 6.0 Cap­stone: Model Or­ganism of En­coded Reasoning

5 Nov 2025 23:45 UTC
6 points
0 comments9 min readLW link

Sen­tient Fu­tures Sum­mit 2026 Bay Area: Ap­ply to Speak!

jonahmattwoodward5 Nov 2025 23:39 UTC
1 point
0 comments1 min readLW link

Break­ing Books: A tool to bring books to the so­cial sphere

Alexandre Variengien5 Nov 2025 22:53 UTC
17 points
1 comment8 min readLW link
(alexandrevariengien.com)

Digi­tal min­i­mal­ism is out, digi­tal in­ten­tion­al­ity is in

mingyuan5 Nov 2025 22:01 UTC
29 points
1 comment2 min readLW link
(mingyuan.substack.com)

An­thropic Com­mits To Model Weight Preservation

Zvi5 Nov 2025 21:30 UTC
84 points
13 comments14 min readLW link
(thezvi.wordpress.com)

Meta-agen­tic Pri­soner’s Dilemmas

TsviBT5 Nov 2025 16:44 UTC
39 points
1 comment5 min readLW link

Liv­ing in the Shadow of The Sort

Gordon Seidoh Worley5 Nov 2025 16:31 UTC
23 points
5 comments5 min readLW link
(www.uncertainupdates.com)

Har­den­ing against AI takeover is difficult, but we should try

otto.barten5 Nov 2025 16:25 UTC
11 points
0 comments5 min readLW link
(www.existentialriskobservatory.org)

AI Safety at the Fron­tier: Paper High­lights of Oc­to­ber 2025

gasteigerjo5 Nov 2025 13:39 UTC
7 points
0 comments8 min readLW link
(aisafetyfrontier.substack.com)

New home­page for AI safety re­sources – AISafety.com redesign

5 Nov 2025 10:33 UTC
35 points
2 comments1 min readLW link

An athe­ist’s guide to prayer

Nathan Young5 Nov 2025 9:51 UTC
18 points
3 comments5 min readLW link
(open.substack.com)

The­ory of Change for US Govt Whistle­blower Database and Guide

samuelshadrach5 Nov 2025 9:08 UTC
2 points
0 comments14 min readLW link
(samuelshadrach.com)

AGI is build­ing itself

Anonim Anonymous5 Nov 2025 8:52 UTC
−9 points
1 comment1 min readLW link

Suffer­ing is what makes it special

Dentosal5 Nov 2025 8:04 UTC
−2 points
1 comment2 min readLW link

Maxwell’s De­mon and the Ar­row of Time

Adam Scherlis5 Nov 2025 7:35 UTC
28 points
2 comments6 min readLW link
(adam.scherl.is)

How to be con­vinc­ing when talk­ing to peo­ple about ex­is­ten­tial threat from AI

Mikhail Samin5 Nov 2025 7:01 UTC
35 points
2 comments5 min readLW link

[FICTION] Sable and Able: A Tale of Two ASIs

Mr Beastly5 Nov 2025 6:18 UTC
−3 points
0 comments18 min readLW link

Why Safety Con­straints in LLMs Are Easily Break­able? Knowl­edge as a Net­work of Gated Circuits

Aditya Raj5 Nov 2025 5:20 UTC
12 points
0 comments4 min readLW link

Us­ing math to foster ac­cep­tance and equality

jackoda5 Nov 2025 5:16 UTC
−1 points
0 comments1 min readLW link

Dario Amodei’s “Machines of Lov­ing Grace” sounds in­cred­ibly dan­ger­ous, for Humans

Super AGI5 Nov 2025 4:42 UTC
13 points
1 comment1 min readLW link

What are you ex­cited about do­ing?

mingyuan5 Nov 2025 4:40 UTC
17 points
0 comments2 min readLW link
(mingyuan.substack.com)

Intentionality

abramdemski5 Nov 2025 4:30 UTC
30 points
4 comments2 min readLW link

Food-re­lated things that have made my life a lit­tle better

Philipreal5 Nov 2025 3:47 UTC
7 points
1 comment2 min readLW link

Ger­ry­man­der­ing California

Nisan5 Nov 2025 2:46 UTC
14 points
0 comments3 min readLW link

How to sur­vive un­til AGI

Nikola Jurkovic5 Nov 2025 1:17 UTC
28 points
3 comments3 min readLW link
(nikolajurkovic.substack.com)

Heroic Responsibility

johnswentworth4 Nov 2025 23:26 UTC
78 points
31 comments2 min readLW link

[Linkpost] Com­pet­ing Mo­ti­va­tions: When More In­cen­tives Lead To Less Effort

Gunnar_Zarncke4 Nov 2025 23:02 UTC
11 points
0 comments1 min readLW link
(x.com)

Not Over Or Un­der Indexed

Screwtape4 Nov 2025 22:54 UTC
39 points
0 comments6 min readLW link

Be­ing “Use­fully Con­crete”

Raemon4 Nov 2025 22:15 UTC
44 points
4 comments4 min readLW link

Leg­ible vs. Illeg­ible AI Safety Problems

Wei Dai4 Nov 2025 21:39 UTC
370 points
95 comments2 min readLW link

Pars­ing Validation

Dentosal4 Nov 2025 21:19 UTC
5 points
1 comment3 min readLW link

A/​B test­ing could lead LLMs to re­tain users in­stead of helping them

Daniel Paleka4 Nov 2025 19:30 UTC
28 points
0 comments4 min readLW link
(newsletter.danielpaleka.com)

OpenAI: The Bat­tle of the Board: Ilya’s Testimony

Zvi4 Nov 2025 19:30 UTC
44 points
1 comment5 min readLW link
(thezvi.wordpress.com)

Berkeley Sec­u­lar Sols­tice Weekend

Raemon4 Nov 2025 18:37 UTC
21 points
18 comments1 min readLW link

Model­ing the geopoli­tics of AI development

4 Nov 2025 17:31 UTC
46 points
0 comments2 min readLW link
(ai-scenarios.com)

Thoughts by a non-economist on AI and economics

Boaz Barak4 Nov 2025 17:06 UTC
41 points
2 comments14 min readLW link

GDM: Con­sis­tency Train­ing Helps Limit Sy­co­phancy and Jailbreaks in Gem­ini 2.5 Flash

4 Nov 2025 16:25 UTC
53 points
2 comments6 min readLW link
(arxiv.org)

AI Safety Camp 11

4 Nov 2025 14:56 UTC
6 points
0 comments15 min readLW link

Keep­ing Ants and Spot­ting Queens

Morpheus4 Nov 2025 13:49 UTC
11 points
0 comments2 min readLW link

Let­ter to a close friend

Alexandre Variengien4 Nov 2025 13:17 UTC
9 points
0 comments2 min readLW link
(alexandrevariengien.com)

Open-weight train­ing prac­tices and im­pli­ca­tions for CoT monitorability

4 Nov 2025 10:49 UTC
15 points
0 comments9 min readLW link

Free Learn­ing in To­day’s So­ciety: Some Per­sonal Ex­pe­riences and Reflections

L.M.Sherlock4 Nov 2025 10:30 UTC
30 points
1 comment41 min readLW link
(lmsherlock.substack.com)

A prayer for en­gag­ing in conflict

TsviBT4 Nov 2025 8:19 UTC
68 points
0 comments2 min readLW link

Rain­bows, frac­tals, and crum­pled pa­per: Hölder continuity

Adam Scherlis4 Nov 2025 8:01 UTC
10 points
0 comments3 min readLW link
(adam.scherl.is)

Taste of food

Mikhail Samin4 Nov 2025 7:47 UTC
22 points
0 comments3 min readLW link
(mikhailsamin.substack.com)

Ret­ro­spec­tive on US govt whistle­blower guide and DB

samuelshadrach4 Nov 2025 7:30 UTC
4 points
0 comments2 min readLW link
(samuelshadrach.com)

US Govt Whistle­blower Guide

samuelshadrach4 Nov 2025 7:22 UTC
1 point
6 comments7 min readLW link
(samuelshadrach.com)

US Govt Whistle­blower Database

samuelshadrach4 Nov 2025 7:20 UTC
6 points
6 comments33 min readLW link
(samuelshadrach.com)

The Mor­tify­ing Ordeal of Know­ing Thyself

Philipreal4 Nov 2025 5:16 UTC
6 points
0 comments3 min readLW link

Build the life you ac­tu­ally want

mingyuan4 Nov 2025 4:50 UTC
53 points
3 comments3 min readLW link
(mingyuan.substack.com)