Back­door aware­ness and mis­al­igned per­sonas in rea­son­ing models

20 Jun 2025 23:38 UTC
35 points
8 comments6 min readLW link

Agen­tic Misal­ign­ment: How LLMs Could be In­sider Threats

20 Jun 2025 22:34 UTC
83 points
13 comments6 min readLW link

Clar­ify­ing “wis­dom”: Foun­da­tional top­ics for al­igned AIs to pri­ori­tize be­fore ir­re­versible decisions

Anthony DiGiovanni20 Jun 2025 21:55 UTC
40 points
2 comments12 min readLW link

Are In­tel­li­gent Agents More Eth­i­cal?

PeterMcCluskey20 Jun 2025 21:26 UTC
13 points
7 comments2 min readLW link

An AI Arms Race Scenario

shanzson20 Jun 2025 19:25 UTC
2 points
2 comments1 min readLW link

Mak­ing deals with early schemers

20 Jun 2025 18:21 UTC
127 points
41 comments15 min readLW link

Ivan Gay­ton: A Right and a Duty

Elizabeth20 Jun 2025 18:20 UTC
21 points
0 comments1 min readLW link
(acesounderglass.com)

What is the func­tional role of SAE er­rors?

20 Jun 2025 18:11 UTC
12 points
6 comments38 min readLW link

Mus­ings on AI Com­pa­nies of 2025-2026 (Jun 2025)

Vladimir_Nesov20 Jun 2025 17:14 UTC
66 points
4 comments3 min readLW link

Es­cap­ing the Jun­gles of Nor­wood: A Ra­tion­al­ist’s Guide to Male Pat­tern Baldness

AlphaAndOmega20 Jun 2025 16:40 UTC
12 points
10 comments1 min readLW link
(open.substack.com)

Pre­fix cache un­trusted mon­i­tors: a method to ap­ply af­ter you catch your AI

ryan_greenblatt20 Jun 2025 15:56 UTC
33 points
2 comments7 min readLW link

Did the Army Poi­son a Bunch of Women in Min­nesota?

rba20 Jun 2025 15:33 UTC
54 points
2 comments4 min readLW link

AI #121 Part 2: The OpenAI Files

Zvi20 Jun 2025 14:50 UTC
37 points
9 comments41 min readLW link
(thezvi.wordpress.com)

Smarter Models Lie Less

Expertium20 Jun 2025 13:31 UTC
6 points
0 comments2 min readLW link

AI Safety Com­mu­ni­ca­tors Meet-up

Vishakha20 Jun 2025 12:34 UTC
3 points
0 comments1 min readLW link

X ex­plains Z% of the var­i­ance in Y

Leon Lang20 Jun 2025 12:17 UTC
160 points
36 comments9 min readLW link

Yes RAND, AI Could Really Cause Hu­man Ex­tinc­tion [cross­post]

otto.barten20 Jun 2025 11:42 UTC
17 points
4 comments4 min readLW link
(www.existentialriskobservatory.org)

Misal­ign­ment or mi­suse? The AGI al­ign­ment tradeoff

Max_He-Ho20 Jun 2025 10:43 UTC
3 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

Paphos

Yudhister Kumar20 Jun 2025 9:25 UTC
4 points
0 comments1 min readLW link
(yudhister.me)

Rome

Yudhister Kumar20 Jun 2025 9:23 UTC
3 points
0 comments2 min readLW link
(yudhister.me)

Geneva

Yudhister Kumar20 Jun 2025 9:22 UTC
4 points
0 comments1 min readLW link
(yudhister.me)

Toledo

Yudhister Kumar20 Jun 2025 9:18 UTC
3 points
0 comments2 min readLW link
(www.yudhister.me)

Graph­ing AI eco­nomic growth rates, or time to Dyson Swarm

denkenberger20 Jun 2025 7:00 UTC
4 points
2 comments1 min readLW link

the silk pa­ja­mas effect

thiccythot20 Jun 2025 3:31 UTC
41 points
11 comments4 min readLW link

Change And Iden­tity: a Story and Dis­cus­sion on the Evolv­ing Self

Rob Lucas20 Jun 2025 1:44 UTC
0 points
0 comments19 min readLW link
(open.substack.com)

Mov­ing Past the Ques­tion of Con­scious­ness: A Thought Experiment

Satya Benson19 Jun 2025 19:52 UTC
12 points
8 comments2 min readLW link
(satchlj.com)

S-Ex­pres­sions as a De­sign Lan­guage: A Tool for De­con­fu­sion in Align­ment

Johannes C. Mayer19 Jun 2025 19:03 UTC
5 points
0 comments6 min readLW link

AISEC: Why to not to be shy.

xen919 Jun 2025 18:16 UTC
4 points
1 comment1 min readLW link

LLMs as am­plifiers, not assistants

Caleb Biddulph19 Jun 2025 17:21 UTC
27 points
8 comments7 min readLW link

How The Singer Sang His Tales

adamShimi19 Jun 2025 17:06 UTC
18 points
0 comments36 min readLW link
(formethods.substack.com)

Key paths, plans and strate­gies to AI safety success

Adam Jones19 Jun 2025 16:56 UTC
13 points
0 comments6 min readLW link
(bluedot.org)

AI safety tech­niques lev­er­ag­ing distillation

ryan_greenblatt19 Jun 2025 14:31 UTC
61 points
0 comments12 min readLW link

Poli­ti­cal Fund­ing Ex­per­tise (Post 6 of 7 on AI Gover­nance)

Mass_Driver19 Jun 2025 14:14 UTC
59 points
4 comments14 min readLW link

Doc­u­ments Are Dead. Long Live the Con­ver­sa­tional Proxy.

8harath19 Jun 2025 14:01 UTC
−9 points
1 comment1 min readLW link

[Question] How did you find out about AI Safety? Why and how did you get in­volved?

Ana Lopez19 Jun 2025 14:00 UTC
1 point
0 comments1 min readLW link

A deep cri­tique of AI 2027’s bad timeline models

titotal19 Jun 2025 13:29 UTC
372 points
40 comments39 min readLW link
(titotal.substack.com)

AI #121 Part 1: New Connections

Zvi19 Jun 2025 13:00 UTC
32 points
12 comments39 min readLW link
(thezvi.wordpress.com)

AI can win a con­flict against us

19 Jun 2025 7:20 UTC
6 points
0 comments2 min readLW link

Differ­ent goals may bring AI into con­flict with us

19 Jun 2025 7:19 UTC
5 points
2 comments2 min readLW link

My Failed AI Safety Re­search Pro­jects (Q1/​Q2 2025)

Adam Newgas19 Jun 2025 3:55 UTC
26 points
3 comments3 min readLW link

TT Self Study Jour­nal # 1

TristanTrim18 Jun 2025 23:36 UTC
8 points
6 comments6 min readLW link

On May 1, 2033, hu­man­ity dis­cov­ered that AI was fairly easy to al­ign.

Yitz18 Jun 2025 19:57 UTC
10 points
3 comments1 min readLW link

New Ethics for the AI Age

Matthieu Tehenan18 Jun 2025 19:30 UTC
1 point
0 comments6 min readLW link

Gem­ini 2.5 Pro: From 0506 to 0605

Zvi18 Jun 2025 19:10 UTC
33 points
0 comments8 min readLW link
(thezvi.wordpress.com)

Fac­tored Cog­ni­tion Strength­ens Mon­i­tor­ing and Thwarts Attacks

Aaron Sandoval18 Jun 2025 18:28 UTC
29 points
0 comments25 min readLW link

Sparsely-con­nected Cross-layer Transcoders

jacob_drori18 Jun 2025 17:13 UTC
51 points
3 comments12 min readLW link

New En­dorse­ments for “If Any­one Builds It, Every­one Dies”

Malo18 Jun 2025 16:30 UTC
488 points
55 comments4 min readLW link
(intelligence.org)

Mo­ral Align­ment: An Idea I’m Em­bar­rassed I Didn’t Think of Myself

Gordon Seidoh Worley18 Jun 2025 15:42 UTC
20 points
54 comments2 min readLW link

This was meant for you

Logan Kieller18 Jun 2025 15:26 UTC
12 points
0 comments8 min readLW link
(agenticconjectures.substack.com)

Chil­dren of War: Hid­den dan­gers of an AI arms race

Peter Kuhn18 Jun 2025 15:19 UTC
4 points
0 comments7 min readLW link