All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar AprMayJun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 181920 21 22 23 24 25 26 27 28 29 30 31

Coordinal: A Postmortem.

Ronak_Mehta18 May 2026 20:43 UTC

37 points

3 comments4 min readLW link

(ronakrm.github.io)

Noticing Confusion: A practice in staying curious

vmehra18 May 2026 19:31 UTC

10 points

1 comment6 min readLW link

Dating Roundup #12: Sex and Violence

Zvi18 May 2026 19:20 UTC

28 points

1 comment27 min readLW link

(thezvi.wordpress.com)

Negation Neglect: When models fail to learn negations in training

harrymayne, Lev McKinney and Owain_Evans

18 May 2026 18:37 UTC

119 points

37 comments8 min readLW link

So are you some kind of communist?

jchan18 May 2026 15:53 UTC

5 points

1 comment3 min readLW link

Thoughts on interviewing candidates for AI safety fellowships

beyarkay (Boyd Kane)18 May 2026 15:28 UTC

34 points

4 comments7 min readLW link

(boydkane.com)

PauseAI Munich Local Group Kickoff

mofeien18 May 2026 15:13 UTC

3 points

0 comments1 min readLW link

Classifier Context Rot: Monitor Performance Degrades with Context Length

Fabien Roger and Sam Martin

18 May 2026 14:05 UTC

54 points

1 comment4 min readLW link

How useful is cross-domain generalization for training LLM monitors?

Fabien Roger and Sam Martin

18 May 2026 13:52 UTC

21 points

0 comments4 min readLW link

Jhana Quick Start Guide

Zmavli Caimle18 May 2026 8:51 UTC

15 points

3 comments11 min readLW link

Links #1: 2026/05 Part 1

papetoast18 May 2026 5:04 UTC

10 points

0 comments18 min readLW link

why pollen allergies?

bhauth18 May 2026 4:44 UTC

33 points

6 comments6 min readLW link

(www.bhauth.com)

Why Physical Attractiveness Matters for Men’s Dating Prospects

johnswentworth18 May 2026 2:22 UTC

9 points

13 comments3 min readLW link

Bay Summer Solstice 2026

Raemon18 May 2026 0:34 UTC

16 points

4 comments1 min readLW link

How to Quit Fandom: Apostasy

Laiba Rehman ✦ RJ17 May 2026 21:09 UTC

58 points

3 comments4 min readLW link

Engineering a Safer World: Risk Modelling — and Safety Engineering? — for AI Loss of Control

Oliver Sourbut17 May 2026 16:02 UTC

10 points

1 comment9 min readLW link

(www.oliversourbut.net)

Next Token Prediction is a Misleading Term

Adam Newgas17 May 2026 11:58 UTC

12 points

2 comments6 min readLW link

(www.boristhebrave.com)

Can ELK be brute-forced? Intertheoretic reduction

Q Home17 May 2026 10:21 UTC

13 points

0 comments3 min readLW link

James C. Scott: Seeing Like a State

Martin Sustrik17 May 2026 8:40 UTC

56 points

6 comments7 min readLW link

(www.250bpm.com)

How to Reason about Your Health Issues

Taylor G. Lunt17 May 2026 5:10 UTC

23 points

28 comments5 min readLW link

Are You Not Rationalists?

J Thomas Moros17 May 2026 3:27 UTC

1 point

0 comments7 min readLW link

Falling for the statistical parrot

FlorianH17 May 2026 1:02 UTC

5 points

0 comments2 min readLW link

On getting unstuck

Joe Rogero17 May 2026 0:59 UTC

21 points

1 comment4 min readLW link

(subatomicarticles.com)

A relatively brief explanation of Boltzmann Brains

Eliezer Yudkowsky16 May 2026 21:19 UTC

206 points

155 comments4 min readLW link

Benchmarking Real Work

kaivu, leni, rohuang and zef

16 May 2026 20:43 UTC

30 points

2 comments4 min readLW link

Critique Systems, Not Reality

Morphism16 May 2026 19:11 UTC

5 points

1 comment25 min readLW link

(thothhermes.substack.com)

Trying to use NLAs to find out how Qwen 2.5 7B does multiplication

Hannes Thurnherr16 May 2026 19:05 UTC

23 points

4 comments6 min readLW link

A Year Late, Claude Finally Beats Pokémon

Julian Bradshaw16 May 2026 7:05 UTC

162 points

12 comments9 min readLW link

NLA Verbalizations on AuditBench: Llama 70B

Realmbird16 May 2026 5:25 UTC

10 points

0 comments3 min readLW link

An Introduction to Exemplar Partitioning for Mechanistic Interpretability

Jessica Rumbelow16 May 2026 3:58 UTC

69 points

7 comments11 min readLW link

(www.leap-labs.com)

An Argument for Analogies

James Stephen Brown16 May 2026 2:21 UTC

11 points

0 comments3 min readLW link

Incriminating misaligned AI models via distillation

Alek Westover, SebastianP, Alex Mallen, Jozdien, Alexa Pan, Julian Stastny and Vivek Hebbar

15 May 2026 21:43 UTC

115 points

12 comments5 min readLW link

Critical Thinking as a Gym Schedule

Alrenous15 May 2026 20:49 UTC

0 points

4 comments3 min readLW link

Why I am not too worried about AIpocalypse: Scott Alexander vs Nicolaus Copernicus

Shmi15 May 2026 20:31 UTC

7 points

15 comments2 min readLW link

Risk reports need to address deployment-time spread of misalignment

Alex Mallen15 May 2026 18:20 UTC

64 points

1 comment5 min readLW link

Monthly Roundup #42: May 2026

Zvi15 May 2026 16:50 UTC

30 points

2 comments24 min readLW link

(thezvi.wordpress.com)

Mechanistic estimation for expectations of random products

Jacob_Hilton, George Robinson, Eric Neyman, paulfchristiano, Mikewins, Victor Lecomte, Wilson Wu and Gabriel Wu

15 May 2026 16:50 UTC

50 points

0 comments5 min readLW link

(www.alignment.org)

Clarifying the Darwinian Honeymoon

Elias Schmied15 May 2026 16:23 UTC

20 points

6 comments3 min readLW link

Announcing the Center for Shared AI Prosperity

Dylan Matthews15 May 2026 12:57 UTC

39 points

13 comments2 min readLW link

MATS 9 Retrospective & Advice

beyarkay (Boyd Kane)15 May 2026 12:30 UTC

198 points

11 comments18 min readLW link

(boydkane.com)

Data Quality is Way Underrated, and We Should Start Funding It.

Osapinion15 May 2026 4:07 UTC

4 points

0 comments2 min readLW link

(substack.com)

Don’t be too Clever to Take Obvious Advice

Hide15 May 2026 3:01 UTC

95 points

26 comments2 min readLW link

(hidefromit.substack.com)

Some observations about NLA explanations

loops15 May 2026 2:15 UTC

21 points

0 comments3 min readLW link

The hard core of alignment (is robustifying RL)

Cole Wyeth15 May 2026 1:02 UTC

39 points

12 comments13 min readLW link

Convergent Abstraction Hypothesis

Jan_Kulveit15 May 2026 0:04 UTC

122 points

20 comments6 min readLW link

Emma Baker on ADHD

koratkar14 May 2026 23:29 UTC

8 points

2 comments3 min readLW link

(emma00baker.substack.com)

Designing AI factual claims for “easy verification”

Raemon14 May 2026 23:23 UTC

33 points

17 comments2 min readLW link

Automated Alignment is Harder Than You Think

Aleksandr Bowkis, Marie_DB, Jacob Pfau and Geoffrey Irving

14 May 2026 22:01 UTC

143 points

6 comments3 min readLW link

(arxiv.org)

2B scoring model flags out-of-domain misalignment, suggesting specialist judges have potential for audits

burnssa14 May 2026 20:00 UTC

8 points

0 comments6 min readLW link

The safe-to-dangerous shift is a fundamental problem for eval realism; but also for measuring awareness

Charlie Griffin and Patrick Leask

14 May 2026 17:05 UTC

59 points

3 comments3 min readLW link