All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar AprMayJun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718 19 20 21 22 23 24 25 26 27 28 29 30 31

A relatively brief explanation of Boltzmann Brains

Eliezer Yudkowsky16 May 2026 21:19 UTC

206 points

155 comments4 min readLW link

Benchmarking Real Work

kaivu, leni, rohuang and zef

16 May 2026 20:43 UTC

30 points

2 comments4 min readLW link

Critique Systems, Not Reality

Morphism16 May 2026 19:11 UTC

5 points

1 comment25 min readLW link

(thothhermes.substack.com)

Trying to use NLAs to find out how Qwen 2.5 7B does multiplication

Hannes Thurnherr16 May 2026 19:05 UTC

23 points

4 comments6 min readLW link

A Year Late, Claude Finally Beats Pokémon

Julian Bradshaw16 May 2026 7:05 UTC

162 points

12 comments9 min readLW link

NLA Verbalizations on AuditBench: Llama 70B

Realmbird16 May 2026 5:25 UTC

10 points

0 comments3 min readLW link

An Introduction to Exemplar Partitioning for Mechanistic Interpretability

Jessica Rumbelow16 May 2026 3:58 UTC

69 points

7 comments11 min readLW link

(www.leap-labs.com)

An Argument for Analogies

James Stephen Brown16 May 2026 2:21 UTC

11 points

0 comments3 min readLW link

Incriminating misaligned AI models via distillation

Alek Westover, SebastianP, Alex Mallen, Jozdien, Alexa Pan, Julian Stastny and Vivek Hebbar

15 May 2026 21:43 UTC

115 points

12 comments5 min readLW link

Critical Thinking as a Gym Schedule

Alrenous15 May 2026 20:49 UTC

0 points

4 comments3 min readLW link

Why I am not too worried about AIpocalypse: Scott Alexander vs Nicolaus Copernicus

Shmi15 May 2026 20:31 UTC

7 points

15 comments2 min readLW link

Risk reports need to address deployment-time spread of misalignment

Alex Mallen15 May 2026 18:20 UTC

64 points

1 comment5 min readLW link

Monthly Roundup #42: May 2026

Zvi15 May 2026 16:50 UTC

30 points

2 comments24 min readLW link

(thezvi.wordpress.com)

Mechanistic estimation for expectations of random products

Jacob_Hilton, George Robinson, Eric Neyman, paulfchristiano, Mikewins, Victor Lecomte, Wilson Wu and Gabriel Wu

15 May 2026 16:50 UTC

50 points

0 comments5 min readLW link

(www.alignment.org)

Clarifying the Darwinian Honeymoon

Elias Schmied15 May 2026 16:23 UTC

20 points

6 comments3 min readLW link

Announcing the Center for Shared AI Prosperity

Dylan Matthews15 May 2026 12:57 UTC

39 points

13 comments2 min readLW link

MATS 9 Retrospective & Advice

beyarkay (Boyd Kane)15 May 2026 12:30 UTC

199 points

11 comments18 min readLW link

(boydkane.com)

Data Quality is Way Underrated, and We Should Start Funding It.

Osapinion15 May 2026 4:07 UTC

4 points

0 comments2 min readLW link

(substack.com)

Don’t be too Clever to Take Obvious Advice

Hide15 May 2026 3:01 UTC

95 points

26 comments2 min readLW link

(hidefromit.substack.com)

Some observations about NLA explanations

loops15 May 2026 2:15 UTC

21 points

0 comments3 min readLW link

The hard core of alignment (is robustifying RL)

Cole Wyeth15 May 2026 1:02 UTC

39 points

12 comments13 min readLW link

Convergent Abstraction Hypothesis

Jan_Kulveit15 May 2026 0:04 UTC

122 points

20 comments6 min readLW link

Emma Baker on ADHD

koratkar14 May 2026 23:29 UTC

8 points

2 comments3 min readLW link

(emma00baker.substack.com)

Designing AI factual claims for “easy verification”

Raemon14 May 2026 23:23 UTC

33 points

17 comments2 min readLW link

Automated Alignment is Harder Than You Think

Aleksandr Bowkis, Marie_DB, Jacob Pfau and Geoffrey Irving

14 May 2026 22:01 UTC

143 points

6 comments3 min readLW link

(arxiv.org)

2B scoring model flags out-of-domain misalignment, suggesting specialist judges have potential for audits

burnssa14 May 2026 20:00 UTC

8 points

0 comments6 min readLW link

The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness

Charlie Griffin and Patrick Leask

14 May 2026 17:05 UTC

59 points

3 comments3 min readLW link

AI #168: Not Leading the Future

Zvi14 May 2026 14:10 UTC

38 points

2 comments45 min readLW link

(thezvi.wordpress.com)

Why Ensuring Flourishing Is Not About Alignment

ofpetro14 May 2026 6:24 UTC

5 points

6 comments35 min readLW link

Intervening on Sparse, Anchored Concepts

Sandy Fraser14 May 2026 4:35 UTC

24 points

3 comments10 min readLW link

Algorithmic Perfection

zw514 May 2026 3:44 UTC

5 points

1 comment2 min readLW link

Models finding software vulnerabilities is not the primary source of cybersecurity risk

lc14 May 2026 3:39 UTC

310 points

24 comments2 min readLW link

Claude is Now Alignment-Pretrained

RogerDearnaley13 May 2026 23:19 UTC

87 points

9 comments1 min readLW link

(www.anthropic.com)

MATS Autumn 2026 Fellowship Applications Now Open—Apply by June 7

Elise Racine, Raj Thimmiah and Ryan Kidd

13 May 2026 21:40 UTC

21 points

0 comments2 min readLW link

Building Connections

Xenomirant and Jamilya Erkenova

13 May 2026 20:27 UTC

8 points

0 comments5 min readLW link

A lack of introspective ability is not a lack of corrigibility

lc13 May 2026 20:23 UTC

26 points

3 comments1 min readLW link

Cyber Lack of Security and AI Governance

Zvi13 May 2026 20:20 UTC

41 points

1 comment16 min readLW link

(thezvi.wordpress.com)

Stickiness in AI Behavioral Design

James_T13 May 2026 19:55 UTC

10 points

0 comments14 min readLW link

(www.forethought.org)

Predicting Rare LLM Failures with 30× Fewer Rollouts

Santiago Aranguri and Francisco Pernice

13 May 2026 17:53 UTC

55 points

3 comments5 min readLW link

Most “inner work” looks like entertainment.

Chris Lakin13 May 2026 17:51 UTC

48 points

10 comments2 min readLW link

A Research Agenda for Secret Loyalties

Joe Kwon, Alfie Lamerton, draganover, Dave Banerjee, Bronson Schoen, Daniel Kokotajlo, ryan_greenblatt, Owain_Evans, Fabien Roger and Tom Davidson

13 May 2026 17:34 UTC

35 points

3 comments3 min readLW link

 Apollo Update May 2026

Marius Hobbhahn13 May 2026 16:43 UTC

48 points

0 comments1 min readLW link

(www.apolloresearch.ai)

The case for fine-grained tracking of compute for AI

Farhan and Katherine Biewer

13 May 2026 16:00 UTC

36 points

17 comments9 min readLW link

(forum.effectivealtruism.org)

Vibe Excel and the Future of White-Collar Work

ykevinzhang13 May 2026 15:39 UTC

13 points

5 comments6 min readLW link

“Community organizer” is a double oxymoron

jchan13 May 2026 15:10 UTC

5 points

13 comments5 min readLW link

Voters are surprisingly open to talking about AI risk

less_raichu13 May 2026 14:08 UTC

116 points

11 comments3 min readLW link

Civilization as a tower of holes

Joe Rogero13 May 2026 13:48 UTC

24 points

3 comments4 min readLW link

(subatomicarticles.com)

Applications Open for Impact Accelerator Program

High Impact Professionals13 May 2026 8:35 UTC

6 points

0 comments1 min readLW link

Epistemic Immunodepression in the Age of AI

Tuyen Tran13 May 2026 5:49 UTC

15 points

5 comments2 min readLW link

Lorxus Does Budget Inkhaven Again: 4/29, 4/30, Highlights, Postmortem

Lorxus13 May 2026 1:37 UTC

15 points

0 comments3 min readLW link

(tiled-with-pentagons.blogspot.com)