All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 91011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Mind the Coherence Gap: Lessons from Steering Llama with Goodfire

eitan sprejer9 May 2025 21:29 UTC

4 points

1 comment6 min readLW link

My Experience With EMDR

Sable9 May 2025 21:25 UTC

22 points

0 comments11 min readLW link

(affablyevil.substack.com)

AI’s Hidden Game: Understanding Strategic Deception in AI and Why It Matters for Our Future

EmilyinAI9 May 2025 20:01 UTC

4 points

0 comments6 min readLW link

Muddling Through Some Thoughts on the Nature of Historiography

E.G. Blee-Goldman9 May 2025 19:04 UTC

2 points

0 comments4 min readLW link

A Guide to AI 2027

koenrane9 May 2025 17:14 UTC

0 points

1 comment28 min readLW link

Let’s stop making “Intelligence scale” graphs with humans and AI

Expertium9 May 2025 16:01 UTC

3 points

15 comments1 min readLW link

Slow corporations as an intuition pump for AI R&D automation

ryan_greenblatt and elifland

9 May 2025 14:49 UTC

91 points

25 comments9 min readLW link

Cheaters Gonna Cheat Cheat Cheat Cheat Cheat

Zvi9 May 2025 14:30 UTC

57 points

4 comments22 min readLW link

(thezvi.wordpress.com)

Humans vs LLM, memes as theorems

Yaroslav Granowski9 May 2025 13:26 UTC

1 point

0 comments1 min readLW link

Moving towards a question-based planning framework, instead of task lists

casualphysicsenjoyer9 May 2025 12:18 UTC

4 points

1 comment8 min readLW link

(substack.com)

Jim Babcock’s Mainline Doom Scenario: Human-Level AI Can’t Control Its Successor

Liron and jimrandomh

9 May 2025 5:20 UTC

30 points

4 comments62 min readLW link

(www.youtube.com)

Attend the 2025 Reproductive Frontiers Summit, June 10-12

TsviBT and Rachel Reid

9 May 2025 5:17 UTC

59 points

0 comments3 min readLW link

Interest In Conflict Is Instrumentally Convergent

Screwtape9 May 2025 2:16 UTC

68 points

58 comments10 min readLW link

Is ChatGPT actually fixed now?

sjadler8 May 2025 23:34 UTC

17 points

0 comments1 min readLW link

(stevenadler.substack.com)

Post EAG London AI x-Safety Co-working Retreat

plex8 May 2025 23:00 UTC

10 points

0 comments1 min readLW link

a brief critique of reduction

Vadim Golub8 May 2025 22:43 UTC

−17 points

4 comments2 min readLW link

Video & transcript: Challenges for Safe & Beneficial Brain-Like AGI

Steven Byrnes8 May 2025 21:11 UTC

27 points

0 comments18 min readLW link

Appendix: Interpretable by Design—Constraint Sets with Disjoint Limit Points

Ronak_Mehta8 May 2025 21:09 UTC

2 points

0 comments2 min readLW link

Interpretable by Design—Constraint Sets with Disjoint Limit Points

Ronak_Mehta8 May 2025 21:08 UTC

24 points

2 comments9 min readLW link

(ronakrm.github.io)

Is there a Half-Life for the Success Rates of AI Agents?

Matrice Jacobine8 May 2025 20:10 UTC

8 points

0 comments1 min readLW link

(www.tobyord.com)

Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking

Buck and Julian Stastny

8 May 2025 19:06 UTC

80 points

3 comments15 min readLW link

Behold the Pale Child (escaping Moloch’s Mad Maze)

rogersbacon8 May 2025 16:36 UTC

8 points

16 comments11 min readLW link

(www.secretorum.life)

An alignment safety case sketch based on debate

Marie_DB, Jacob Pfau, Benjamin Hilton and Geoffrey Irving

8 May 2025 15:02 UTC

62 points

21 comments25 min readLW link

(arxiv.org)

Mechanistic Interpretability Via Learning Differential Equations: AI Safety Camp Project Intermediate Report.

Valentin2026, ayoakin, Eduard Kovalets, tz3r0n4r, Soumyadeep Bose, Utkarsh Priyadarshi, Varun Piram and Axel Ahlqvist

8 May 2025 14:45 UTC

8 points

0 comments7 min readLW link

AI #115: The Evil Applications Division

Zvi8 May 2025 13:40 UTC

32 points

3 comments62 min readLW link

(thezvi.wordpress.com)

The Steganographic Potentials of Language Models

Artem Karpov, Tinuade and SCho

8 May 2025 11:23 UTC

9 points

0 comments1 min readLW link

Our bet on whether the AI market will crash

Remmelt and mabramov

8 May 2025 9:56 UTC

29 points

4 comments1 min readLW link

Sparse Concept Anchoring

Sandy Fraser8 May 2025 8:59 UTC

6 points

0 comments3 min readLW link

Orthogonality Thesis in layman’s terms.

Michael (@lethal_ai)8 May 2025 8:31 UTC

1 point

0 comments2 min readLW link

Arkose may be closing, but you can help

Victoria Brook8 May 2025 7:28 UTC

8 points

0 comments2 min readLW link

Healing powers of meditation or the role of attention in humoral regulation.

Yaroslav Granowski8 May 2025 6:48 UTC

7 points

0 comments1 min readLW link

Orienting Toward Wizard Power

johnswentworth8 May 2025 5:23 UTC

591 points

148 comments5 min readLW link

Relational Alignment: Trust, Repair, and the Emotional Work of AI

Priyanka Bharadwaj8 May 2025 2:44 UTC

3 points

0 comments3 min readLW link

There’s more low-hanging fruit in interdisciplinary work thanks to LLMs

ChristianKl7 May 2025 19:48 UTC

27 points

2 comments1 min readLW link

OpenAI Claims Nonprofit Will Retain Nominal Control

Zvi7 May 2025 19:40 UTC

65 points

4 comments11 min readLW link

(thezvi.wordpress.com)

Social status games might have “compute weight class” in the future

Raemon7 May 2025 18:56 UTC

34 points

7 comments2 min readLW link

Events of Low Probability: Buridan’s Principle

Nikita Gladkov7 May 2025 18:46 UTC

12 points

1 comment10 min readLW link

[Question] Which journalists would you give quotes to? [one journalist per comment, agree vote for trustworthy]

Nathan Young7 May 2025 18:39 UTC

19 points

33 comments1 min readLW link

Please Donate to CAIP (Post 1 of 7 on AI Governance)

Mass_Driver7 May 2025 17:13 UTC

126 points

20 comments33 min readLW link

UK AISI’s Alignment Team: Research Agenda

Benjamin Hilton, Jacob Pfau, Marie_DB and Geoffrey Irving

7 May 2025 16:33 UTC

115 points

3 comments11 min readLW link

Four Predictions About OpenAI’s Plans To Retain Nonprofit Control

garrison7 May 2025 15:48 UTC

12 points

0 comments5 min readLW link

(www.obsolete.pub)

A Disciplined Way to Avoid Wireheading

amitlevy497 May 2025 15:20 UTC

18 points

6 comments5 min readLW link

(ivy0.substack.com)

Reflections on Compatibilism, Ontological Translations, and the Artificial Divine

Mahdi Complex7 May 2025 12:16 UTC

2 points

1 comment22 min readLW link

The Historical Parallels: Preliminary Reflection

EQ7 May 2025 8:06 UTC

3 points

0 comments9 min readLW link

(eqmind.substack.com)

European Links (07.05.25)

Martin Sustrik7 May 2025 4:20 UTC

10 points

0 comments2 min readLW link

(250bpm.substack.com)

[Question] Chess—“Elo” of random play?

Shankar Sivarajan7 May 2025 2:18 UTC

10 points

16 comments1 min readLW link

$500 + $500 Bounty Problem: Does An (Approximately) Deterministic Maximal Redund Always Exist?

johnswentworth and David Lorell

6 May 2025 23:05 UTC

74 points

19 comments3 min readLW link

Loss Curves

programjames6 May 2025 22:22 UTC

16 points

3 comments4 min readLW link

(github.com)

Negative Results on Group SAEs

Josh Engels6 May 2025 21:49 UTC

78 points

3 comments8 min readLW link

ACX Atlanta May 2025 Meetup

Steve French6 May 2025 21:00 UTC

2 points

0 comments1 min readLW link