Be­ware LLMs’ patholog­i­cal guardrailing

lc19 Sep 2025 20:55 UTC
20 points
1 comment1 min readLW link

Safety re­searchers should take a pub­lic stance

19 Sep 2025 18:55 UTC
230 points
65 comments8 min readLW link

Day 16 Hunger Strike—Guido Re­ich­stader Interviewed

samuelshadrach19 Sep 2025 17:30 UTC
9 points
0 comments1 min readLW link

Prospects for study­ing ac­tual schemers

19 Sep 2025 14:11 UTC
40 points
0 comments58 min readLW link

Book Re­view: If Any­one Builds It, Every­one Dies

Zvi19 Sep 2025 11:30 UTC
61 points
3 comments31 min readLW link
(thezvi.wordpress.com)

How peo­ple poli­ti­cally con­front the Modern Eldritch

19 Sep 2025 10:18 UTC
5 points
0 comments14 min readLW link
(cognition.cafe)

My Minor AI Safety Re­search Pro­jects (Q3 2025)

Adam Newgas19 Sep 2025 9:53 UTC
6 points
1 comment2 min readLW link

Book Re­view: If Any­one Builds It, Every­one Dies

Nina Panickssery19 Sep 2025 4:50 UTC
41 points
1 comment11 min readLW link
(blog.ninapanickssery.com)

Me­mory De­cod­ing Jour­nal Club: Distinct synap­tic plas­tic­ity rules op­er­ate across den­dritic com­part­ments in vivo dur­ing learning

Devin Ward19 Sep 2025 4:17 UTC
3 points
0 comments1 min readLW link

AI psy­chosis isn’t re­ally psychosis

GGWG19 Sep 2025 3:18 UTC
5 points
1 comment1 min readLW link

JDP Re­views IABIED

jdp19 Sep 2025 1:23 UTC
74 points
21 comments8 min readLW link
(minihf.com)

Teach­ing My Tod­dler To Read

maia19 Sep 2025 0:17 UTC
155 points
17 comments10 min readLW link

IABIED Re­view—An Un­for­tu­nate Miss

Darren McKee18 Sep 2025 22:39 UTC
65 points
22 comments9 min readLW link

You can’t eval GPT5 anymore

Lukas Petersson18 Sep 2025 22:12 UTC
152 points
11 comments1 min readLW link

Oxford – ACX Mee­tups Every­where Fall 2025

18 Sep 2025 20:22 UTC
1 point
0 comments1 min readLW link

If any­one builds it, ev­ery­one will plau­si­bly be fine

joshc18 Sep 2025 20:03 UTC
29 points
24 comments7 min readLW link

It Never Worked Be­fore: Nine In­tel­lec­tual Jokes

Linch18 Sep 2025 19:48 UTC
13 points
2 comments2 min readLW link
(linch.substack.com)

An At­tempt to Ex­plain my AI Risk Ex­plainer Attempt

thenoviceoof18 Sep 2025 19:35 UTC
11 points
2 comments10 min readLW link
(thenoviceoof.com)

More Was Pos­si­ble: A Re­view of IABIED

Vaniver18 Sep 2025 19:33 UTC
54 points
5 comments1 min readLW link
(asteriskmag.com)

Can an AI be­come hu­man?

Robert Shuler18 Sep 2025 19:18 UTC
3 points
0 comments19 min readLW link

The Strange Case of Emer­gent Misalignment

18 Sep 2025 14:45 UTC
2 points
0 comments5 min readLW link

AI #134: If Any­one Reads It

Zvi18 Sep 2025 13:10 UTC
34 points
8 comments61 min readLW link
(thezvi.wordpress.com)

Th­ese are my rea­sons to worry less about loss of con­trol over LLM-based agents

otto.barten18 Sep 2025 11:45 UTC
7 points
4 comments4 min readLW link

The End-of-the-World Party

Jakub Growiec18 Sep 2025 7:49 UTC
1 point
0 comments53 min readLW link

On­tolo­gies of the Artificial

snav18 Sep 2025 1:32 UTC
11 points
2 comments7 min readLW link

UC Berkeley::Cas­san­dra’s Cir­cle Vir­tual Read­ing Group for: “If Any­one Builds It”

saifrahmed18 Sep 2025 1:28 UTC
11 points
0 comments1 min readLW link

[Question] Bet­ting on gods: Seek­ing Essen­tial Self-Assess­ment Ques­tions for Re­duc­ing Cog­ni­tive Bi­ases

P. João17 Sep 2025 21:46 UTC
3 points
0 comments2 min readLW link

Meetup Month

Raemon17 Sep 2025 21:10 UTC
45 points
10 comments3 min readLW link

A Cheaper Way to Test Ven­tila­tion Rates?

casualphysicsenjoyer17 Sep 2025 21:10 UTC
18 points
1 comment4 min readLW link
(chillphysicsenjoyer.substack.com)

Re­ac­tions to If Any­one Builds It, Any­one Dies

Zvi17 Sep 2025 20:00 UTC
59 points
1 comment13 min readLW link
(thezvi.wordpress.com)

How To Dress To Im­prove Your Epistemics

johnswentworth17 Sep 2025 19:28 UTC
37 points
58 comments6 min readLW link

AISafety.com Read­ing Group ses­sion 327

Søren Elverlin17 Sep 2025 18:20 UTC
13 points
3 comments1 min readLW link

The Com­pany Man

Tomás B.17 Sep 2025 17:47 UTC
688 points
63 comments18 min readLW link

Le­gal Per­son­hood—Guardian­ship and the Age of Majority

Stephen Martin17 Sep 2025 17:14 UTC
4 points
0 comments5 min readLW link

Stress Test­ing De­liber­a­tive Align­ment for Anti-Schem­ing Training

17 Sep 2025 16:59 UTC
124 points
13 comments1 min readLW link
(antischeming.ai)

LLMs Don’t Know Their Own De­ci­sion Boundaries. Why Is This Im­por­tant?

17 Sep 2025 16:39 UTC
8 points
0 comments5 min readLW link
(arxiv.org)

Soft­ware Eng­ineer­ing Lead­er­ship in Flux

Gordon Seidoh Worley17 Sep 2025 16:11 UTC
65 points
6 comments1 min readLW link
(uncertainupdates.substack.com)

Proof Sec­tion to Crisp Supra-De­ci­sion Processes

Brittany Gelb17 Sep 2025 15:57 UTC
4 points
0 comments3 min readLW link

Crisp Supra-De­ci­sion Processes

Brittany Gelb17 Sep 2025 15:56 UTC
34 points
0 comments17 min readLW link

Com­men­tary on SSC’s In the Balance

PatrickDFarley17 Sep 2025 15:49 UTC
12 points
0 comments8 min readLW link

What train­ing data should de­vel­op­ers filter to re­duce risk from mis­al­igned AI? An ini­tial nar­row proposal

Alek Westover17 Sep 2025 15:30 UTC
32 points
1 comment18 min readLW link

In­fer­ence costs for hard cod­ing tasks halve roughly ev­ery two months

Håvard Tveit Ihle17 Sep 2025 15:04 UTC
15 points
0 comments4 min readLW link

Chris­tian home­school­ers in the year 3000

Buck17 Sep 2025 14:44 UTC
190 points
64 comments7 min readLW link

Vi­sual Ex­plo­ra­tion of Gra­di­ent Des­cent (many images)

silentbob17 Sep 2025 13:09 UTC
38 points
9 comments20 min readLW link

The Cen­ter for AI Policy Has Shut Down

Tristan Williams17 Sep 2025 11:04 UTC
94 points
2 comments14 min readLW link

A Steer­ing Vec­tor for SQL In­jec­tion Vuln­er­a­bil­ities in Phi-1.5

Kirill Dubovikov17 Sep 2025 5:54 UTC
5 points
1 comment8 min readLW link

I en­joyed most of IABIED

Buck17 Sep 2025 4:34 UTC
207 points
46 comments8 min readLW link

AR Might be the Key to BCI (and even­tu­ally, Emu­la­tion)

ixotope17 Sep 2025 0:46 UTC
3 points
0 comments10 min readLW link
(ixotopic.substack.com)

Don’t talk about the AGI con­trol problem

jakob.stenseke@gmail.com17 Sep 2025 0:42 UTC
2 points
0 comments1 min readLW link
(link.springer.com)

10/​09/​25 IABIED Q&A with Nate Soares in SF

RobinGoins17 Sep 2025 0:00 UTC
2 points
0 comments1 min readLW link