All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar AprMayJun

All 1 2 3 4 567 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Toward a Better Evaluations Ecosystem

Benjamin Arnav5 May 2026 22:29 UTC

24 points

0 comments5 min readLW link

Model Spec Midtraining: Improving How Alignment Training Generalizes

Chloe Li, Nevan Wichers, saraprice, Sam Marks and Jonathan Kutasov

5 May 2026 21:55 UTC

71 points

7 comments7 min readLW link

(alignment.anthropic.com)

Positive Feedback Only

Florian_Dietz5 May 2026 21:28 UTC

18 points

0 comments8 min readLW link

What if LLMs are mostly crystallized intelligence?

deep5 May 2026 20:50 UTC

45 points

10 comments9 min readLW link

(expectedsurprise.substack.com)

Decision theory doesn’t prove that useful strong AIs will doom us all

deep5 May 2026 20:47 UTC

8 points

0 comments9 min readLW link

(expectedsurprise.substack.com)

Psychopathy: The Mechanics

Dawn Drescher5 May 2026 20:26 UTC

2 points

0 comments10 min readLW link

(impartial-priorities.org)

A Federal Inmate Asks: Was My Prosecution Rational?

seth_tins5 May 2026 19:56 UTC

11 points

2 comments5 min readLW link

The AI Ad-Hoc Prior Restraint Era Begins

Zvi5 May 2026 19:30 UTC

63 points

5 comments10 min readLW link

(thezvi.wordpress.com)

Your rights when flying to Europe

Yair Halberstadt5 May 2026 19:17 UTC

92 points

14 comments5 min readLW link

[Linkpost] Interpreting Language Model Parameters

Lucius Bushnaq, Dan Braun, Oliver Clive-Griffin, Bart Bussmann, Nathan Hu, mivanitskiy, Linda Linsefors and Lee Sharkey

5 May 2026 17:37 UTC

162 points

2 comments2 min readLW link

(www.goodfire.ai)

Motivated reasoning, confirmation bias, and AI risk theory

Seth Herd5 May 2026 15:56 UTC

66 points

18 comments41 min readLW link

The Best Argument Against Deontology Is About Suitcases

Bentham's Bulldog5 May 2026 15:24 UTC

−1 points

11 comments19 min readLW link

Codesign for Legibility (to AI and Everyone Else)

Adam Chlipala5 May 2026 13:46 UTC

1 point

0 comments7 min readLW link

Dawn of the “national security” tier of AI

Mitchell_Porter5 May 2026 9:40 UTC

16 points

3 comments1 min readLW link

Forbidden Backrooms: Self-Chat with a Refusal-Abliterated LLM

AlliedToasters5 May 2026 7:55 UTC

9 points

0 comments5 min readLW link

Training Model to Predict Its Own Generalization: A Preliminary Study

Tianyi (Alex) Qiu5 May 2026 5:50 UTC

17 points

0 comments7 min readLW link

Are you looking up?

Craig Green5 May 2026 3:03 UTC

42 points

2 comments8 min readLW link

(open.substack.com)

Alarming Scheduling

jefftk5 May 2026 2:40 UTC

26 points

9 comments1 min readLW link

(www.jefftk.com)

Don’t solve people’s problems for them

Declan Molony5 May 2026 0:02 UTC

10 points

13 comments4 min readLW link

April 2026 Links

nomagicpill4 May 2026 23:17 UTC

8 points

1 comment11 min readLW link

My speech at the PauseAI Capitol protest April 13

maia4 May 2026 22:51 UTC

30 points

1 comment2 min readLW link

Irretrievability; or, Murphy’s Curse of Oneshotness upon ASI

Eliezer Yudkowsky4 May 2026 22:11 UTC

367 points

132 comments22 min readLW link

It’s nice of you to worry about me, but I really do have a life

Viliam4 May 2026 21:14 UTC

332 points

61 comments4 min readLW link

Psychopathy: The Self

Dawn Drescher4 May 2026 21:13 UTC

20 points

0 comments15 min readLW link

(impartial-priorities.org)

Verbalized Eval Awareness Inflates Measured Safety

Santiago Aranguri and Joseph Bloom

4 May 2026 20:02 UTC

44 points

0 comments29 min readLW link

Housing Roundup #15: The War Against Renters

Zvi4 May 2026 20:01 UTC

34 points

5 comments14 min readLW link

(thezvi.wordpress.com)

The AI Industrial Explosion — Part 1: Maximum growth rates with current production methods

djbinder4 May 2026 15:32 UTC

106 points

11 comments12 min readLW link

(defensesindepth.bio)

Apply for ARBOx4 [deadline May 8th]

Juliana Eberschlag and James Lester

4 May 2026 15:28 UTC

13 points

0 comments3 min readLW link

Taking woo seriously but not literally

Kaj_Sotala4 May 2026 13:36 UTC

123 points

27 comments23 min readLW link

(kajsotala.substack.com)

The Threat of AI Crimes Are Under-Appreciated

Joshua Krook4 May 2026 11:33 UTC

3 points

8 comments3 min readLW link

Psychopathy: The Shaping

Dawn Drescher4 May 2026 7:58 UTC

3 points

0 comments8 min readLW link

(impartial-priorities.org)

Conflict 2.0: Leaving behind shame/fault, right/wrong

honeybee4 May 2026 3:07 UTC

6 points

0 comments4 min readLW link

Enneagram Epicycles

Gordon Seidoh Worley4 May 2026 3:00 UTC

9 points

1 comment4 min readLW link

(www.uncertainupdates.com)

Auto-review of agent actions without synchronous human oversight

papetoast4 May 2026 2:12 UTC

6 points

0 comments1 min readLW link

(alignment.openai.com)

ASI motives and the ontonormative goods (re IABIED’s core argument)

Zsolt Tanko3 May 2026 23:38 UTC

4 points

4 comments4 min readLW link

How did ‘large’ language models get that way? The role of Transformers and Pretraining in GPT

Oliver Sourbut3 May 2026 21:35 UTC

16 points

0 comments7 min readLW link

(www.oliversourbut.net)

Dairy cows make their misery expensive (but their calves can’t)

Elizabeth3 May 2026 19:20 UTC

159 points

1 comment6 min readLW link

(acesounderglass.com)

[Question] Looking for papers on general formalizations of “agency”

lovagrus3 May 2026 18:32 UTC

12 points

1 comment2 min readLW link

Why I made Engineering Enigmas

kqr3 May 2026 18:04 UTC

13 points

0 comments3 min readLW link

Deontological bars should reference the actor’s beliefs

TFD3 May 2026 15:09 UTC

8 points

6 comments3 min readLW link

We don’t learn numbers from set cardinality

azergante3 May 2026 11:33 UTC

4 points

15 comments3 min readLW link

MHC Interp #1: Previous-Token Heads Become Attention Sinks Under Manifold-Constrained Hyper-Connections

Realmbird3 May 2026 11:06 UTC

21 points

3 comments5 min readLW link

The Repugnant Lifespan Conclusion

XelaP3 May 2026 9:22 UTC

12 points

20 comments3 min readLW link

Pursuing the target

Adam Zerner3 May 2026 7:59 UTC

30 points

1 comment2 min readLW link

Paraphrasing Is (At Best) a Partial Defence Against Steganography in LLMs

Usman Anwar and robertzk

3 May 2026 7:53 UTC

14 points

0 comments8 min readLW link

LLMs Choose the Safer Gamble Yet Price the Riskier One Higher

Jonathan Dang3 May 2026 7:51 UTC

12 points

0 comments4 min readLW link

Bypassing Refusal Behavior in Qwen Models via Activation Steering

Talib Mirza3 May 2026 6:07 UTC

1 point

0 comments2 min readLW link

Notes on equanimity from the inside

nonplus2 May 2026 23:42 UTC

15 points

1 comment4 min readLW link

Psychopathy: The Substrate

Dawn Drescher2 May 2026 22:48 UTC

5 points

0 comments8 min readLW link

(impartial-priorities.org)

Measuring the ability of Opus 4.5 to fool narrow classifiers

Fabien Roger and John Hughes

2 May 2026 22:43 UTC

31 points

0 comments8 min readLW link