All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May Jun Jul AugSepOct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 192021 22 23 24 25 26 27 28 29 30

Beware LLMs’ pathological guardrailing

lc19 Sep 2025 20:55 UTC

22 points

1 comment1 min readLW link

Safety researchers should take a public stance

Mateusz Bagiński and Ishual

19 Sep 2025 18:55 UTC

254 points

65 comments8 min readLW link

Day 16 Hunger Strike—Guido Reichstader Interviewed

samuelshadrach19 Sep 2025 17:30 UTC

9 points

0 comments1 min readLW link

Prospects for studying actual schemers

ryan_greenblatt and Julian Stastny

19 Sep 2025 14:11 UTC

40 points

2 comments58 min readLW link

Book Review: If Anyone Builds It, Everyone Dies

Zvi19 Sep 2025 11:30 UTC

66 points

3 comments31 min readLW link

(thezvi.wordpress.com)

How people politically confront the Modern Eldritch

PranavG and Gabriel Alfour

19 Sep 2025 10:18 UTC

11 points

0 comments14 min readLW link

(cognition.cafe)

My Minor AI Safety Research Projects (Q3 2025)

Adam Newgas19 Sep 2025 9:53 UTC

6 points

1 comment2 min readLW link

 Book Review: If Anyone Builds It, Everyone Dies

Nina Panickssery19 Sep 2025 4:50 UTC

49 points

1 comment11 min readLW link

(blog.ninapanickssery.com)

Memory Decoding Journal Club: Distinct synaptic plasticity rules operate across dendritic compartments in vivo during learning

Devin Ward19 Sep 2025 4:17 UTC

3 points

0 comments1 min readLW link

AI psychosis isn’t really psychosis

GGWG19 Sep 2025 3:18 UTC

6 points

2 comments1 min readLW link

JDP Reviews IABIED

jdp19 Sep 2025 1:23 UTC

89 points

21 comments8 min readLW link

(minihf.com)

Teaching My Toddler To Read

maia19 Sep 2025 0:17 UTC

159 points

21 comments10 min readLW link

IABIED Review—An Unfortunate Miss

Darren McKee18 Sep 2025 22:39 UTC

65 points

22 comments9 min readLW link

You can’t eval GPT5 anymore

Lukas Petersson18 Sep 2025 22:12 UTC

169 points

15 comments1 min readLW link

Oxford – ACX Meetups Everywhere Fall 2025

fenmund and Sam F. Brown

18 Sep 2025 20:22 UTC

1 point

0 comments1 min readLW link

If anyone builds it, everyone will plausibly be fine

joshc18 Sep 2025 20:03 UTC

32 points

24 comments7 min readLW link

It Never Worked Before: Nine Intellectual Jokes

Linch18 Sep 2025 19:48 UTC

13 points

2 comments2 min readLW link

(linch.substack.com)

An Attempt to Explain my AI Risk Explainer Attempt

thenoviceoof18 Sep 2025 19:35 UTC

11 points

2 comments10 min readLW link

(thenoviceoof.com)

More Was Possible: A Review of IABIED

Vaniver18 Sep 2025 19:33 UTC

55 points

5 comments1 min readLW link

(asteriskmag.com)

Can an AI become human?

Robert Shuler18 Sep 2025 19:18 UTC

3 points

0 comments8 min readLW link

The Strange Case of Emergent Misalignment

Alexander Müller and ilijalichkovski

18 Sep 2025 14:45 UTC

2 points

0 comments5 min readLW link

AI #134: If Anyone Reads It

Zvi18 Sep 2025 13:10 UTC

35 points

8 comments61 min readLW link

(thezvi.wordpress.com)

These are my reasons to worry less about loss of control over LLM-based agents

otto.barten18 Sep 2025 11:45 UTC

7 points

6 comments4 min readLW link

The End-of-the-World Party

Jakub Growiec18 Sep 2025 7:49 UTC

2 points

0 comments52 min readLW link

Ontologies of the Artificial

snav18 Sep 2025 1:32 UTC

11 points

2 comments7 min readLW link

UC Berkeley::Cassandra’s Circle Virtual Reading Group for: “If Anyone Builds It”

saifrahmed18 Sep 2025 1:28 UTC

11 points

0 comments1 min readLW link

Meetup Month

Raemon17 Sep 2025 21:10 UTC

45 points

10 comments3 min readLW link

A Cheaper Way to Test Ventilation Rates?

casualphysicsenjoyer17 Sep 2025 21:10 UTC

18 points

1 comment4 min readLW link

(chillphysicsenjoyer.substack.com)

Reactions to If Anyone Builds It, Anyone Dies

Zvi17 Sep 2025 20:00 UTC

62 points

1 comment13 min readLW link

(thezvi.wordpress.com)

How To Dress To Improve Your Epistemics

johnswentworth17 Sep 2025 19:28 UTC

35 points

60 comments6 min readLW link

AISafety.com Reading Group session 327

Søren Elverlin17 Sep 2025 18:20 UTC

13 points

3 comments1 min readLW link

The Company Man

Tomás B.17 Sep 2025 17:47 UTC

830 points

79 comments18 min readLW link

Legal Personhood—Guardianship and the Age of Majority

Stephen Martin17 Sep 2025 17:14 UTC

4 points

0 comments5 min readLW link

Stress Testing Deliberative Alignment for Anti-Scheming Training

Mikita Balesni, Bronson Schoen, Marius Hobbhahn, Axel Højmark, AlexMeinke, Teun van der Weij, Jérémy Scheurer, Felix Hofstätter, Nicholas Goldowsky-Dill, rusheb, Andrei Matveiakin, jenny and alex.lloyd

17 Sep 2025 16:59 UTC

133 points

19 comments1 min readLW link

(antischeming.ai)

LLMs Don’t Know Their Own Decision Boundaries. Why Is This Important?

harrymayne and ryanothnielkearns

17 Sep 2025 16:39 UTC

9 points

0 comments5 min readLW link

(arxiv.org)

Software Engineering Leadership in Flux

Gordon Seidoh Worley17 Sep 2025 16:11 UTC

66 points

6 comments1 min readLW link

(uncertainupdates.substack.com)

Proof Section to Crisp Supra-Decision Processes

Brittany Gelb17 Sep 2025 15:57 UTC

4 points

0 comments3 min readLW link

Crisp Supra-Decision Processes

Brittany Gelb17 Sep 2025 15:56 UTC

42 points

4 comments17 min readLW link

Commentary on SSC’s In the Balance

PatrickDFarley17 Sep 2025 15:49 UTC

12 points

0 comments8 min readLW link

What training data should developers filter to reduce risk from misaligned AI? An initial narrow proposal

Alek Westover17 Sep 2025 15:30 UTC

44 points

4 comments18 min readLW link

Inference costs for hard coding tasks halve roughly every two months

Håvard Tveit Ihle17 Sep 2025 15:04 UTC

16 points

0 comments4 min readLW link

Christian homeschoolers in the year 3000

Buck17 Sep 2025 14:44 UTC

207 points

65 comments7 min readLW link

Visual Exploration of Gradient Descent (many images)

silentbob17 Sep 2025 13:09 UTC

40 points

9 comments20 min readLW link

The Center for AI Policy Has Shut Down

T_W17 Sep 2025 11:04 UTC

95 points

2 comments14 min readLW link

A Steering Vector for SQL Injection Vulnerabilities in Phi-1.5

Kirill Dubovikov17 Sep 2025 5:54 UTC

5 points

2 comments8 min readLW link

I enjoyed most of IABIED

Buck17 Sep 2025 4:34 UTC

210 points

46 comments8 min readLW link

AR Might be the Key to BCI (and eventually, Emulation)

ixotope17 Sep 2025 0:46 UTC

4 points

0 comments10 min readLW link

(ixotopic.substack.com)

Emergent misalignment as contextual role inference

Helen.ix17 Sep 2025 0:44 UTC

4 points

0 comments6 min readLW link

Don’t talk about the AGI control problem

jakob.stenseke@gmail.com17 Sep 2025 0:42 UTC

2 points

0 comments1 min readLW link

(link.springer.com)

10/09/25 IABIED Q&A with Nate Soares in SF

coponder17 Sep 2025 0:00 UTC

2 points

0 comments1 min readLW link