All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr MayJunJul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 192021 22 23 24 25 26 27 28 29 30

Moving Past the Question of Consciousness: A Thought Experiment

Satya Benson19 Jun 2025 19:52 UTC

12 points

8 comments2 min readLW link

(satchlj.com)

S-Expressions as a Design Language: A Tool for Deconfusion in Alignment

Johannes C. Mayer19 Jun 2025 19:03 UTC

5 points

0 comments6 min readLW link

AISEC: Why to not to be shy.

xen919 Jun 2025 18:16 UTC

4 points

1 comment1 min readLW link

LLMs as amplifiers, not assistants

Caleb Biddulph19 Jun 2025 17:21 UTC

27 points

8 comments7 min readLW link

How The Singer Sang His Tales

adamShimi19 Jun 2025 17:06 UTC

18 points

0 comments36 min readLW link

(formethods.substack.com)

Key paths, plans and strategies to AI safety success

Adam Jones19 Jun 2025 16:56 UTC

13 points

0 comments6 min readLW link

(bluedot.org)

AI safety techniques leveraging distillation

ryan_greenblatt19 Jun 2025 14:31 UTC

61 points

0 comments12 min readLW link

Political Funding Expertise (Post 6 of 7 on AI Governance)

Mass_Driver19 Jun 2025 14:14 UTC

59 points

4 comments14 min readLW link

Documents Are Dead. Long Live the Conversational Proxy.

8harath19 Jun 2025 14:01 UTC

−9 points

1 comment1 min readLW link

[Question] How did you find out about AI Safety? Why and how did you get involved?

Ana Lopez19 Jun 2025 14:00 UTC

1 point

0 comments1 min readLW link

A deep critique of AI 2027’s bad timeline models

titotal19 Jun 2025 13:29 UTC

372 points

40 comments39 min readLW link

(titotal.substack.com)

AI #121 Part 1: New Connections

Zvi19 Jun 2025 13:00 UTC

32 points

12 comments39 min readLW link

(thezvi.wordpress.com)

AI can win a conflict against us

Algon, steven0461 and Vishakha

19 Jun 2025 7:20 UTC

6 points

0 comments2 min readLW link

Different goals may bring AI into conflict with us

Algon, steven0461 and Vishakha

19 Jun 2025 7:19 UTC

5 points

2 comments2 min readLW link

My Failed AI Safety Research Projects (Q1/Q2 2025)

Adam Newgas19 Jun 2025 3:55 UTC

26 points

3 comments3 min readLW link

TT Self Study Journal # 1

TristanTrim18 Jun 2025 23:36 UTC

8 points

6 comments6 min readLW link

On May 1, 2033, humanity discovered that AI was fairly easy to align.

Yitz18 Jun 2025 19:57 UTC

10 points

3 comments1 min readLW link

New Ethics for the AI Age

Matthieu Tehenan18 Jun 2025 19:30 UTC

1 point

0 comments6 min readLW link

Gemini 2.5 Pro: From 0506 to 0605

Zvi18 Jun 2025 19:10 UTC

33 points

0 comments8 min readLW link

(thezvi.wordpress.com)

Factored Cognition Strengthens Monitoring and Thwarts Attacks

Aaron Sandoval18 Jun 2025 18:28 UTC

29 points

0 comments25 min readLW link

Sparsely-connected Cross-layer Transcoders

jacob_drori18 Jun 2025 17:13 UTC

51 points

3 comments12 min readLW link

New Endorsements for “If Anyone Builds It, Everyone Dies”

Malo18 Jun 2025 16:30 UTC

488 points

55 comments4 min readLW link

(intelligence.org)

Moral Alignment: An Idea I’m Embarrassed I Didn’t Think of Myself

Gordon Seidoh Worley18 Jun 2025 15:42 UTC

20 points

54 comments2 min readLW link

This was meant for you

Logan Kieller18 Jun 2025 15:26 UTC

12 points

0 comments8 min readLW link

(agenticconjectures.substack.com)

Children of War: Hidden dangers of an AI arms race

Peter Kuhn18 Jun 2025 15:19 UTC

4 points

0 comments7 min readLW link

Open Source Search (Summary)

samuelshadrach18 Jun 2025 7:35 UTC

21 points

1 comment6 min readLW link

(samuelshadrach.com)

Fictional Thinking and Real Thinking

johnswentworth17 Jun 2025 19:13 UTC

57 points

11 comments4 min readLW link

The Curious Case of the bos_token

larry-dial17 Jun 2025 19:00 UTC

26 points

4 comments10 min readLW link

AISN #57: The RAISE Act

Corin Katzke and Dan H

17 Jun 2025 18:02 UTC

6 points

0 comments3 min readLW link

(newsletter.safe.ai)

AI Safety at the Frontier: Paper Highlights, May ’25

gasteigerjo17 Jun 2025 17:16 UTC

6 points

0 comments8 min readLW link

(aisafetyfrontier.substack.com)

[Linkpost] The lethal trifecta for AI agents: private data, untrusted content, and external communication

Gunnar_Zarncke17 Jun 2025 16:09 UTC

13 points

3 comments1 min readLW link

(simonwillison.net)

Agentic Interpretability: A Strategy Against Gradual Disempowerment

beenkim and Neel Nanda

17 Jun 2025 14:52 UTC

17 points

6 comments2 min readLW link

Prover-Estimator Debate: A New Scalable Oversight Protocol

Jonah Brown-Cohen and Geoffrey Irving

17 Jun 2025 13:53 UTC

89 points

19 comments5 min readLW link

o3 Turns Pro

Zvi17 Jun 2025 13:50 UTC

30 points

1 comment14 min readLW link

(thezvi.wordpress.com)

Watch R1 “think” with animated chains of thought

future_detective17 Jun 2025 10:38 UTC

4 points

0 comments1 min readLW link

(github.com)

Serving LLM on Huawei CloudMatrix

sanxiyn17 Jun 2025 5:59 UTC

24 points

7 comments1 min readLW link

(arxiv.org)

Personal agents

Roman Leventov17 Jun 2025 2:05 UTC

9 points

1 comment7 min readLW link

I made a card game to reduce cognitive biases and logical fallacies but I’m not sure what DV to test in a study on its effectiveness.

Brad Dunn17 Jun 2025 1:02 UTC

50 points

15 comments5 min readLW link

Notes on Meetup Ideas

Commander Zander17 Jun 2025 0:11 UTC

12 points

4 comments2 min readLW link

Darkness Meditation—for NZ Winter Solstice 2025

joshuamerriam16 Jun 2025 23:58 UTC

2 points

0 comments4 min readLW link

[Question] Are superhuman savants real?

Bunthut16 Jun 2025 22:02 UTC

15 points

4 comments1 min readLW link

Ok, AI Can Write Pretty Good Fiction Now

JustisMills16 Jun 2025 21:13 UTC

59 points

34 comments6 min readLW link

(justismills.substack.com)

Subjective experience is most likely physical

martinkunev16 Jun 2025 20:54 UTC

5 points

3 comments4 min readLW link

VLMs can Aggregate Scattered Training Patches

LINGJIE CHEN16 Jun 2025 18:25 UTC

2 points

0 comments4 min readLW link

Setpoint = The experience we attend to

jimmy16 Jun 2025 17:34 UTC

22 points

0 comments7 min readLW link

Thought Crime: Backdoors & Emergent Misalignment in Reasoning Models

James Chua and Owain_Evans

16 Jun 2025 16:43 UTC

69 points

2 comments8 min readLW link

How LLM Beliefs Change During Chain-of-Thought Reasoning

Filip Sondej, Petr Kašpárek, alex-kazda and Tomáš Gavenčiak

16 Jun 2025 16:18 UTC

32 points

3 comments5 min readLW link

Convergent Linear Representations of Emergent Misalignment

Anna Soligo, Edward Turner, Senthooran Rajamanoharan and Neel Nanda

16 Jun 2025 15:47 UTC

76 points

1 comment8 min readLW link

Model Organisms for Emergent Misalignment

Anna Soligo, Edward Turner, Mia Taylor, Senthooran Rajamanoharan and Neel Nanda

16 Jun 2025 15:46 UTC

118 points

19 comments5 min readLW link

Coaching AI: A Relational Approach to AI Safety

Priyanka Bharadwaj16 Jun 2025 15:33 UTC

11 points

0 comments5 min readLW link