All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Is ChatGPT actually fixed now?

sjadler8 May 2025 23:34 UTC

17 points

0 comments1 min readLW link

(stevenadler.substack.com)

Post EAG London AI x-Safety Co-working Retreat

plex8 May 2025 23:00 UTC

10 points

0 comments1 min readLW link

a brief critique of reduction

Vadim Golub8 May 2025 22:43 UTC

−17 points

4 comments2 min readLW link

Video & transcript: Challenges for Safe & Beneficial Brain-Like AGI

Steven Byrnes8 May 2025 21:11 UTC

27 points

0 comments18 min readLW link

Appendix: Interpretable by Design—Constraint Sets with Disjoint Limit Points

Ronak_Mehta8 May 2025 21:09 UTC

2 points

0 comments2 min readLW link

Interpretable by Design—Constraint Sets with Disjoint Limit Points

Ronak_Mehta8 May 2025 21:08 UTC

24 points

2 comments9 min readLW link

(ronakrm.github.io)

Is there a Half-Life for the Success Rates of AI Agents?

Matrice Jacobine8 May 2025 20:10 UTC

8 points

0 comments1 min readLW link

(www.tobyord.com)

Misalignment and Strategic Underperformance: An Analysis of Sandbagging and Exploration Hacking

Buck and Julian Stastny

8 May 2025 19:06 UTC

80 points

3 comments15 min readLW link

Behold the Pale Child (escaping Moloch’s Mad Maze)

rogersbacon8 May 2025 16:36 UTC

8 points

16 comments11 min readLW link

(www.secretorum.life)

An alignment safety case sketch based on debate

Marie_DB, Jacob Pfau, Benjamin Hilton and Geoffrey Irving

8 May 2025 15:02 UTC

59 points

21 comments25 min readLW link

(arxiv.org)

Mechanistic Interpretability Via Learning Differential Equations: AI Safety Camp Project Intermediate Report.

Valentin2026, ayoakin, Eduard Kovalets, tz3r0n4r, Soumyadeep Bose, Utkarsh Priyadarshi, Varun Piram and Axel Ahlqvist

8 May 2025 14:45 UTC

8 points

0 comments7 min readLW link

AI #115: The Evil Applications Division

Zvi8 May 2025 13:40 UTC

32 points

3 comments62 min readLW link

(thezvi.wordpress.com)

The Steganographic Potentials of Language Models

Artem Karpov, Tinuade and SCho

8 May 2025 11:23 UTC

9 points

0 comments1 min readLW link

Our bet on whether the AI market will crash

Remmelt and mabramov

8 May 2025 9:56 UTC

25 points

2 comments1 min readLW link

Concept-anchored representation engineering for alignment

Sandy Fraser8 May 2025 8:59 UTC

5 points

0 comments3 min readLW link

Orthogonality Thesis in layman’s terms.

Michael (@lethal_ai)8 May 2025 8:31 UTC

1 point

0 comments2 min readLW link

Arkose may be closing, but you can help

Victoria Brook8 May 2025 7:28 UTC

8 points

0 comments2 min readLW link

Healing powers of meditation or the role of attention in humoral regulation.

Yaroslav Granowski8 May 2025 6:48 UTC

7 points

0 comments1 min readLW link

Orienting Toward Wizard Power

johnswentworth8 May 2025 5:23 UTC

576 points

148 comments5 min readLW link

Relational Alignment: Trust, Repair, and the Emotional Work of AI

Priyanka Bharadwaj8 May 2025 2:44 UTC

3 points

0 comments3 min readLW link

There’s more low-hanging fruit in interdisciplinary work thanks to LLMs

ChristianKl7 May 2025 19:48 UTC

26 points

2 comments1 min readLW link

OpenAI Claims Nonprofit Will Retain Nominal Control

Zvi7 May 2025 19:40 UTC

65 points

4 comments11 min readLW link

(thezvi.wordpress.com)

Social status games might have “compute weight class” in the future

Raemon7 May 2025 18:56 UTC

34 points

7 comments2 min readLW link

Events of Low Probability: Buridan’s Principle

Nikita Gladkov7 May 2025 18:46 UTC

12 points

1 comment10 min readLW link

[Question] Which journalists would you give quotes to? [one journalist per comment, agree vote for trustworthy]

Nathan Young7 May 2025 18:39 UTC

12 points

26 comments1 min readLW link

Please Donate to CAIP (Post 1 of 7 on AI Governance)

Mass_Driver7 May 2025 17:13 UTC

123 points

20 comments33 min readLW link

UK AISI’s Alignment Team: Research Agenda

Benjamin Hilton, Jacob Pfau, Marie_DB and Geoffrey Irving

7 May 2025 16:33 UTC

113 points

3 comments11 min readLW link

Four Predictions About OpenAI’s Plans To Retain Nonprofit Control

garrison7 May 2025 15:48 UTC

12 points

0 comments5 min readLW link

(www.obsolete.pub)

A Disciplined Way to Avoid Wireheading

amitlevy497 May 2025 15:20 UTC

18 points

6 comments5 min readLW link

(ivy0.substack.com)

Reflections on Compatibilism, Ontological Translations, and the Artificial Divine

Mahdi Complex7 May 2025 12:16 UTC

2 points

1 comment22 min readLW link

The Historical Parallels: Preliminary Reflection

EQ7 May 2025 8:06 UTC

3 points

0 comments9 min readLW link

(eqmind.substack.com)

European Links (07.05.25)

Martin Sustrik7 May 2025 4:20 UTC

10 points

0 comments2 min readLW link

(250bpm.substack.com)

[Question] Chess—“Elo” of random play?

Shankar Sivarajan7 May 2025 2:18 UTC

10 points

16 comments1 min readLW link

$500 + $500 Bounty Problem: Does An (Approximately) Deterministic Maximal Redund Always Exist?

johnswentworth and David Lorell

6 May 2025 23:05 UTC

73 points

16 comments3 min readLW link

Loss Curves

James Camacho6 May 2025 22:22 UTC

16 points

3 comments4 min readLW link

(github.com)

Negative Results on Group SAEs

Josh Engels6 May 2025 21:49 UTC

76 points

3 comments8 min readLW link

ACX Atlanta May 2025 Meetup

Steve French6 May 2025 21:00 UTC

2 points

0 comments1 min readLW link

[Question] What kind of policy by an AGI would make people happy?

StanislavKrym6 May 2025 18:05 UTC

1 point

2 comments1 min readLW link

AI Safety at the Frontier: Paper Highlights, April ’25

gasteigerjo6 May 2025 14:22 UTC

4 points

0 comments7 min readLW link

(aisafetyfrontier.substack.com)

Zuckerberg’s Dystopian AI Vision

Zvi6 May 2025 13:50 UTC

62 points

7 comments11 min readLW link

(thezvi.wordpress.com)

Will protein design tools solve the snake antivenom shortage?

Abhishaike Mahajan6 May 2025 13:11 UTC

31 points

0 comments17 min readLW link

(www.owlposting.com)

Utah Court Case Over State Law Regarding “Personhood” for Nonhuman Intelligences

Stephen Martin6 May 2025 12:54 UTC

10 points

3 comments2 min readLW link

Global Risks Weekly Roundup #18/2025: US tariff shortages, military policing, Gaza famine.

NunoSempere6 May 2025 10:39 UTC

31 points

2 comments3 min readLW link

(blog.sentinel-team.org)

OpenAI’s Jig May Be Up

Vale6 May 2025 8:51 UTC

3 points

2 comments3 min readLW link

My Reasons for Using Anki

Parker Conley6 May 2025 7:01 UTC

10 points

1 comment3 min readLW link

(parconley.com)

It’s ‘Well, actually...’ all the way down

benwr6 May 2025 5:44 UTC

40 points

34 comments1 min readLW link

(www.benwr.net)

Five Hinge‑Questions That Decide Whether AGI Is Five Years Away or Twenty

charlieoneill6 May 2025 2:48 UTC

127 points

17 comments5 min readLW link

Nonprofit to retain control of OpenAI

Archimedes5 May 2025 23:41 UTC

37 points

1 comment1 min readLW link

(openai.com)

Unexpected Conscious Entities

Gunnar_Zarncke5 May 2025 22:14 UTC

34 points

7 comments6 min readLW link

The First Law of Conscious Agency: Linguistic Relativity and the Birth of “I”

Dima (lain)5 May 2025 21:20 UTC

−17 points

4 comments2 min readLW link