All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr MayJunJul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 222324 25 26 27 28 29 30

The AI’s Toolbox: From Soggy Toast to Optimal Solutions

Thehumanproject.ai22 Jun 2025 20:54 UTC

1 point

0 comments8 min readLW link

Black-box interpretability methodology blueprint: Probing runaway optimisation in LLMs

Roland Pihlakas and Three Laws

22 Jun 2025 18:16 UTC

17 points

0 comments7 min readLW link

The Croissant Principle: A Theory of AI Generalization

Jeffrey Liang22 Jun 2025 17:58 UTC

20 points

6 comments2 min readLW link

Relational Design Can’t Be Left to Chance

Priyanka Bharadwaj22 Jun 2025 15:32 UTC

5 points

0 comments3 min readLW link

Grounding to Avoid Airplane Delays

jefftk22 Jun 2025 1:50 UTC

30 points

0 comments2 min readLW link

(www.jefftk.com)

Open questions on compatibilist free will and subjunctive dependence

Jack Thompson22 Jun 2025 1:15 UTC

3 points

0 comments1 min readLW link

(jacktlab.substack.com)

The Sixteen Kinds of Intimacy

Ruby21 Jun 2025 19:59 UTC

57 points

2 comments5 min readLW link

Book review: Against Method

Valdes21 Jun 2025 18:59 UTC

9 points

0 comments6 min readLW link

Contrived evaluations are useful evaluations

pradyuprasad21 Jun 2025 18:18 UTC

3 points

0 comments3 min readLW link

(speculativedecoding.substack.com)

Consider chilling out in 2028

Valentine21 Jun 2025 17:07 UTC

208 points

144 comments13 min readLW link

Upcoming workshop on Post-AGI Civilizational Equilibria

David Duvenaud, Jan_Kulveit, Raymond Douglas, Nora_Ammann and David Scott Krueger

21 Jun 2025 15:57 UTC

25 points

0 comments1 min readLW link

Genomic emancipation

TsviBT21 Jun 2025 8:15 UTC

83 points

14 comments26 min readLW link

Evaluating the Risk of Job Displacement by Transformative AI Automation in Developing Countries: A Case Study on Brazil

Abubakar21 Jun 2025 0:48 UTC

4 points

0 comments15 min readLW link

Backdoor awareness and misaligned personas in reasoning models

James Chua, Owain_Evans and Jan Betley

20 Jun 2025 23:38 UTC

37 points

8 comments6 min readLW link

Agentic Misalignment: How LLMs Could be Insider Threats

Aengus Lynch, Benjamin Wright, Ethan Perez and evhub

20 Jun 2025 22:34 UTC

77 points

13 comments6 min readLW link

Clarifying “wisdom”: Foundational topics for aligned AIs to prioritize before irreversible decisions

Anthony DiGiovanni20 Jun 2025 21:55 UTC

42 points

2 comments12 min readLW link

Are Intelligent Agents More Ethical?

PeterMcCluskey20 Jun 2025 21:26 UTC

13 points

7 comments2 min readLW link

An AI Arms Race Scenario

shanzson20 Jun 2025 19:25 UTC

2 points

2 comments1 min readLW link

Making deals with early schemers

Julian Stastny, Olli Järviniemi and Buck

20 Jun 2025 18:21 UTC

133 points

42 comments15 min readLW link

Ivan Gayton: A Right and a Duty

Elizabeth20 Jun 2025 18:20 UTC

21 points

0 comments1 min readLW link

(acesounderglass.com)

What is the functional role of SAE errors?

Taras Kutsyk, Tim Hua, woog and Andre Assis

20 Jun 2025 18:11 UTC

12 points

6 comments38 min readLW link

Musings on AI Companies of 2025-2026 (Jun 2025)

Vladimir_Nesov20 Jun 2025 17:14 UTC

66 points

4 comments3 min readLW link

Escaping the Jungles of Norwood: A Rationalist’s Guide to Male Pattern Baldness

AlphaAndOmega20 Jun 2025 16:40 UTC

12 points

10 comments1 min readLW link

(open.substack.com)

Prefix cache untrusted monitors: a method to apply after you catch your AI

ryan_greenblatt20 Jun 2025 15:56 UTC

33 points

2 comments7 min readLW link

Did the Army Poison a Bunch of Women in Minnesota?

rba20 Jun 2025 15:33 UTC

54 points

2 comments4 min readLW link

AI #121 Part 2: The OpenAI Files

Zvi20 Jun 2025 14:50 UTC

37 points

9 comments41 min readLW link

(thezvi.wordpress.com)

Smarter Models Lie Less

Expertium20 Jun 2025 13:31 UTC

6 points

0 comments2 min readLW link

AI Safety Communicators Meet-up

Vishakha20 Jun 2025 12:34 UTC

3 points

0 comments1 min readLW link

X explains Z% of the variance in Y

Leon Lang20 Jun 2025 12:17 UTC

160 points

36 comments9 min readLW link

Yes RAND, AI Could Really Cause Human Extinction [crosspost]

otto.barten20 Jun 2025 11:42 UTC

17 points

4 comments4 min readLW link

(www.existentialriskobservatory.org)

Misalignment or misuse? The AGI alignment tradeoff

Max_He-Ho20 Jun 2025 10:43 UTC

3 points

0 comments1 min readLW link

(forum.effectivealtruism.org)

Graphing AI economic growth rates, or time to Dyson Swarm

denkenberger20 Jun 2025 7:00 UTC

4 points

2 comments1 min readLW link

the silk pajamas effect

thiccythot20 Jun 2025 3:31 UTC

41 points

11 comments4 min readLW link

Change And Identity: a Story and Discussion on the Evolving Self

Rob Lucas20 Jun 2025 1:44 UTC

0 points

0 comments19 min readLW link

(open.substack.com)

Moving Past the Question of Consciousness: A Thought Experiment

Satya Benson19 Jun 2025 19:52 UTC

13 points

8 comments2 min readLW link

(satchlj.com)

S-Expressions as a Design Language: A Tool for Deconfusion in Alignment

Johannes C. Mayer19 Jun 2025 19:03 UTC

5 points

0 comments6 min readLW link

AISEC: Why to not to be shy.

xen919 Jun 2025 18:16 UTC

4 points

1 comment1 min readLW link

LLMs as amplifiers, not assistants

Caleb Biddulph19 Jun 2025 17:21 UTC

27 points

8 comments7 min readLW link

How The Singer Sang His Tales

adamShimi19 Jun 2025 17:06 UTC

18 points

0 comments36 min readLW link

(formethods.substack.com)

Key paths, plans and strategies to AI safety success

Adam Jones19 Jun 2025 16:56 UTC

19 points

1 comment6 min readLW link

(bluedot.org)

AI safety techniques leveraging distillation

ryan_greenblatt19 Jun 2025 14:31 UTC

62 points

0 comments12 min readLW link

Political Funding Expertise (Post 6 of 7 on AI Governance)

Mass_Driver19 Jun 2025 14:14 UTC

59 points

4 comments14 min readLW link

Documents Are Dead. Long Live the Conversational Proxy.

8harath19 Jun 2025 14:01 UTC

−9 points

1 comment1 min readLW link

[Question] How did you find out about AI Safety? Why and how did you get involved?

Ana Lopez19 Jun 2025 14:00 UTC

1 point

0 comments1 min readLW link

A deep critique of AI 2027’s bad timeline models

titotal19 Jun 2025 13:29 UTC

378 points

40 comments39 min readLW link

(titotal.substack.com)

AI #121 Part 1: New Connections

Zvi19 Jun 2025 13:00 UTC

32 points

12 comments39 min readLW link

(thezvi.wordpress.com)

AI can win a conflict against us

Algon, steven0461 and Vishakha

19 Jun 2025 7:20 UTC

6 points

0 comments2 min readLW link

Different goals may bring AI into conflict with us

Algon, steven0461 and Vishakha

19 Jun 2025 7:19 UTC

5 points

2 comments2 min readLW link

My Failed AI Safety Research Projects (Q1/Q2 2025)

Adam Newgas19 Jun 2025 3:55 UTC

27 points

3 comments3 min readLW link

TT Self Study Journal # 1

TristanTrim18 Jun 2025 23:36 UTC

8 points

6 comments6 min readLW link