20 Jun 2025 23:38 UTC

35 points

8 comments6 min readLW link

Agentic Misalignment: How LLMs Could be Insider Threats

Aengus Lynch, Benjamin Wright, Ethan Perez and evhub

20 Jun 2025 22:34 UTC

83 points

13 comments6 min readLW link

Clarifying “wisdom”: Foundational topics for aligned AIs to prioritize before irreversible decisions

Anthony DiGiovanni20 Jun 2025 21:55 UTC

40 points

2 comments12 min readLW link

Are Intelligent Agents More Ethical?

PeterMcCluskey20 Jun 2025 21:26 UTC

13 points

7 comments2 min readLW link

An AI Arms Race Scenario

shanzson20 Jun 2025 19:25 UTC

2 points

2 comments1 min readLW link

Making deals with early schemers

Julian Stastny, Olli Järviniemi and Buck

20 Jun 2025 18:21 UTC

127 points

41 comments15 min readLW link

Ivan Gayton: A Right and a Duty

Elizabeth20 Jun 2025 18:20 UTC

21 points

0 comments1 min readLW link

(acesounderglass.com)

What is the functional role of SAE errors?

Taras Kutsyk, Tim Hua, woog and Andre Assis

20 Jun 2025 18:11 UTC

12 points

6 comments38 min readLW link

Musings on AI Companies of 2025-2026 (Jun 2025)

Vladimir_Nesov20 Jun 2025 17:14 UTC

66 points

4 comments3 min readLW link

Escaping the Jungles of Norwood: A Rationalist’s Guide to Male Pattern Baldness

AlphaAndOmega20 Jun 2025 16:40 UTC

12 points

10 comments1 min readLW link

(open.substack.com)

Prefix cache untrusted monitors: a method to apply after you catch your AI

ryan_greenblatt20 Jun 2025 15:56 UTC

33 points

2 comments7 min readLW link

Did the Army Poison a Bunch of Women in Minnesota?

rba20 Jun 2025 15:33 UTC

54 points

2 comments4 min readLW link

AI #121 Part 2: The OpenAI Files

Zvi20 Jun 2025 14:50 UTC

37 points

9 comments41 min readLW link

(thezvi.wordpress.com)

Smarter Models Lie Less

Expertium20 Jun 2025 13:31 UTC

6 points

0 comments2 min readLW link

AI Safety Communicators Meet-up

Vishakha20 Jun 2025 12:34 UTC

3 points

0 comments1 min readLW link

X explains Z% of the variance in Y

Leon Lang20 Jun 2025 12:17 UTC

160 points

36 comments9 min readLW link

Yes RAND, AI Could Really Cause Human Extinction [crosspost]

otto.barten20 Jun 2025 11:42 UTC

17 points

4 comments4 min readLW link

(www.existentialriskobservatory.org)

Misalignment or misuse? The AGI alignment tradeoff

Max_He-Ho20 Jun 2025 10:43 UTC

3 points

0 comments1 min readLW link

(forum.effectivealtruism.org)

Paphos

Yudhister Kumar20 Jun 2025 9:25 UTC

4 points

0 comments1 min readLW link

(yudhister.me)

Rome

Yudhister Kumar20 Jun 2025 9:23 UTC

3 points

0 comments2 min readLW link

(yudhister.me)

Geneva

Yudhister Kumar20 Jun 2025 9:22 UTC

4 points

0 comments1 min readLW link

(yudhister.me)

Toledo

Yudhister Kumar20 Jun 2025 9:18 UTC

3 points

0 comments2 min readLW link

(www.yudhister.me)

Graphing AI economic growth rates, or time to Dyson Swarm

denkenberger20 Jun 2025 7:00 UTC

4 points

2 comments1 min readLW link

the silk pajamas effect

thiccythot20 Jun 2025 3:31 UTC

41 points

11 comments4 min readLW link

Change And Identity: a Story and Discussion on the Evolving Self

Rob Lucas20 Jun 2025 1:44 UTC

0 points

0 comments19 min readLW link

(open.substack.com)

Moving Past the Question of Consciousness: A Thought Experiment

Satya Benson19 Jun 2025 19:52 UTC

12 points

8 comments2 min readLW link

(satchlj.com)

S-Expressions as a Design Language: A Tool for Deconfusion in Alignment

Johannes C. Mayer19 Jun 2025 19:03 UTC

5 points

0 comments6 min readLW link

AISEC: Why to not to be shy.

xen919 Jun 2025 18:16 UTC

4 points

1 comment1 min readLW link

LLMs as amplifiers, not assistants

Caleb Biddulph19 Jun 2025 17:21 UTC

27 points

8 comments7 min readLW link

How The Singer Sang His Tales

adamShimi19 Jun 2025 17:06 UTC

18 points

0 comments36 min readLW link

(formethods.substack.com)

Key paths, plans and strategies to AI safety success

Adam Jones19 Jun 2025 16:56 UTC

13 points

0 comments6 min readLW link

(bluedot.org)

AI safety techniques leveraging distillation

ryan_greenblatt19 Jun 2025 14:31 UTC

61 points

0 comments12 min readLW link

Political Funding Expertise (Post 6 of 7 on AI Governance)

Mass_Driver19 Jun 2025 14:14 UTC

59 points

4 comments14 min readLW link

Documents Are Dead. Long Live the Conversational Proxy.

8harath19 Jun 2025 14:01 UTC

−9 points

1 comment1 min readLW link

[Question] How did you find out about AI Safety? Why and how did you get involved?

Ana Lopez19 Jun 2025 14:00 UTC

1 point

0 comments1 min readLW link

A deep critique of AI 2027’s bad timeline models

titotal19 Jun 2025 13:29 UTC

372 points

40 comments39 min readLW link

(titotal.substack.com)

AI #121 Part 1: New Connections

Zvi19 Jun 2025 13:00 UTC

32 points

12 comments39 min readLW link

(thezvi.wordpress.com)

AI can win a conflict against us

Algon, steven0461 and Vishakha

19 Jun 2025 7:20 UTC

6 points

0 comments2 min readLW link

Different goals may bring AI into conflict with us

Algon, steven0461 and Vishakha

19 Jun 2025 7:19 UTC

5 points

2 comments2 min readLW link

My Failed AI Safety Research Projects (Q1/Q2 2025)

Adam Newgas19 Jun 2025 3:55 UTC

26 points

3 comments3 min readLW link

TT Self Study Journal # 1

TristanTrim18 Jun 2025 23:36 UTC

8 points

6 comments6 min readLW link

On May 1, 2033, humanity discovered that AI was fairly easy to align.

Yitz18 Jun 2025 19:57 UTC

10 points

3 comments1 min readLW link

New Ethics for the AI Age

Matthieu Tehenan18 Jun 2025 19:30 UTC

1 point

0 comments6 min readLW link

Gemini 2.5 Pro: From 0506 to 0605

Zvi18 Jun 2025 19:10 UTC

33 points

0 comments8 min readLW link

(thezvi.wordpress.com)

Factored Cognition Strengthens Monitoring and Thwarts Attacks

Aaron Sandoval18 Jun 2025 18:28 UTC

29 points

0 comments25 min readLW link

Sparsely-connected Cross-layer Transcoders

jacob_drori18 Jun 2025 17:13 UTC

51 points

3 comments12 min readLW link

New Endorsements for “If Anyone Builds It, Everyone Dies”

Malo18 Jun 2025 16:30 UTC

488 points

55 comments4 min readLW link

(intelligence.org)

Moral Alignment: An Idea I’m Embarrassed I Didn’t Think of Myself

Gordon Seidoh Worley18 Jun 2025 15:42 UTC

20 points

54 comments2 min readLW link

This was meant for you

Logan Kieller18 Jun 2025 15:26 UTC

12 points

0 comments8 min readLW link

(agenticconjectures.substack.com)

Children of War: Hidden dangers of an AI arms race

Peter Kuhn18 Jun 2025 15:19 UTC

4 points

0 comments7 min readLW link