All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar AprMayJun

All 1 2 345 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

ASI motives and the ontonormative goods (re IABIED’s core argument)

Zsolt Tanko3 May 2026 23:38 UTC

4 points

4 comments4 min readLW link

How did ‘large’ language models get that way? The role of Transformers and Pretraining in GPT

Oliver Sourbut3 May 2026 21:35 UTC

16 points

0 comments7 min readLW link

(www.oliversourbut.net)

Dairy cows make their misery expensive (but their calves can’t)

Elizabeth3 May 2026 19:20 UTC

159 points

1 comment6 min readLW link

(acesounderglass.com)

[Question] Looking for papers on general formalizations of “agency”

lovagrus3 May 2026 18:32 UTC

12 points

1 comment2 min readLW link

Why I made Engineering Enigmas

kqr3 May 2026 18:04 UTC

13 points

0 comments3 min readLW link

Deontological bars should reference the actor’s beliefs

TFD3 May 2026 15:09 UTC

8 points

6 comments3 min readLW link

We don’t learn numbers from set cardinality

azergante3 May 2026 11:33 UTC

4 points

15 comments3 min readLW link

MHC Interp #1: Previous-Token Heads Become Attention Sinks Under Manifold-Constrained Hyper-Connections

Realmbird3 May 2026 11:06 UTC

21 points

3 comments5 min readLW link

The Repugnant Lifespan Conclusion

XelaP3 May 2026 9:22 UTC

12 points

20 comments3 min readLW link

Pursuing the target

Adam Zerner3 May 2026 7:59 UTC

30 points

1 comment2 min readLW link

Paraphrasing Is (At Best) a Partial Defence Against Steganography in LLMs

Usman Anwar and robertzk

3 May 2026 7:53 UTC

14 points

0 comments8 min readLW link

LLMs Choose the Safer Gamble Yet Price the Riskier One Higher

Jonathan Dang3 May 2026 7:51 UTC

12 points

0 comments4 min readLW link

Bypassing Refusal Behavior in Qwen Models via Activation Steering

Talib Mirza3 May 2026 6:07 UTC

1 point

0 comments2 min readLW link

Notes on equanimity from the inside

nonplus2 May 2026 23:42 UTC

15 points

1 comment4 min readLW link

Psychopathy: The Substrate

Dawn Drescher2 May 2026 22:48 UTC

5 points

0 comments8 min readLW link

(impartial-priorities.org)

Measuring the ability of Opus 4.5 to fool narrow classifiers

Fabien Roger and John Hughes

2 May 2026 22:43 UTC

31 points

0 comments8 min readLW link

Evaluating different AI’s on African livestck knowledge

Fatika Umar Ibrahim2 May 2026 20:28 UTC

23 points

4 comments1 min readLW link

Announcing Metaculus Summer 2026 FutureEval Bot Tournament

postreal2 May 2026 20:27 UTC

1 point

0 comments4 min readLW link

(www.metaculus.com)

You Are Not Immune To Mode Collapse

J Bostock2 May 2026 19:57 UTC

127 points

18 comments4 min readLW link

(jbostock.substack.com)

AI Risk Agility Plans—v0.1

Chris_Leong2 May 2026 19:30 UTC

10 points

0 comments1 min readLW link

A new rationalist self-improvement book: the 12 Levers

spencerg2 May 2026 17:40 UTC

56 points

1 comment6 min readLW link

OpenAI’s red line for AI self-improvement is fundamentally flawed

Charbel-Raphaël2 May 2026 14:44 UTC

35 points

3 comments3 min readLW link

Psychopathy: The Problem

Dawn Drescher2 May 2026 10:23 UTC

19 points

16 comments11 min readLW link

(impartial-priorities.org)

Games that change your mind

KatjaGrace2 May 2026 7:40 UTC

74 points

42 comments3 min readLW link

(worldspiritsockpuppet.com)

Understand why AI is a doom-risk in 39 captivating minutes

KatjaGrace2 May 2026 7:40 UTC

18 points

0 comments1 min readLW link

(worldspiritsockpuppet.com)

Primary Care Physicians are Incompetent. We Need More of Them.

Hide2 May 2026 5:47 UTC

53 points

35 comments9 min readLW link

(hidefromit.substack.com)

Contributing to Technical Research in the AI Safety End Game

Sturb2 May 2026 3:17 UTC

34 points

1 comment4 min readLW link

A Simulation of Social Groups Under A Gift Economy

Mira Kennard2 May 2026 2:26 UTC

21 points

1 comment5 min readLW link

Human-looking robots are a bad idea

martinkunev2 May 2026 1:04 UTC

1 point

0 comments4 min readLW link

How Go Players Disempower Themselves to AI

Ashe Vazquez Nuñez1 May 2026 23:24 UTC

700 points

77 comments8 min readLW link

Early-stage empirical work on “spillway motivations”

Arjun Khandelwal, Anders Cairns Woodruff and Alex Mallen

1 May 2026 21:29 UTC

26 points

3 comments8 min readLW link

Exploration Hacking: Can LLMs Learn to Resist RL Training?

Eyon Jang, Joschka Braun, Damon Falck and David Lindner

1 May 2026 20:54 UTC

24 points

0 comments8 min readLW link

Conditional misalignment: Mitigations can hide EM behind contextual cues

Jan Dubiński and Owain_Evans

1 May 2026 20:09 UTC

67 points

2 comments11 min readLW link

Ambitious Mech Interp w/ Tensor-transformers on toy languages [Project Proposal]

Logan Riggs1 May 2026 19:17 UTC

21 points

0 comments2 min readLW link

Risk from fitness-seeking AIs: mechanisms and mitigations

Alex Mallen1 May 2026 17:42 UTC

107 points

0 comments32 min readLW link

Your four-dimensional body

PatrickDFarley1 May 2026 17:22 UTC

8 points

1 comment3 min readLW link

Housing Roundup #14: You Can’t Build That

Zvi1 May 2026 16:50 UTC

25 points

1 comment23 min readLW link

(thezvi.wordpress.com)

What do Russian olympiad winners think of HPMOR? Our data

Mikhail Samin1 May 2026 13:28 UTC

21 points

2 comments1 min readLW link

Housing Roundup #13: More Dakka

Zvi1 May 2026 13:00 UTC

13 points

1 comment13 min readLW link

(thezvi.wordpress.com)

Qualia are internal variables but they are taken from different realm

avturchin1 May 2026 10:43 UTC

9 points

13 comments2 min readLW link

Open strategic questions for digital minds

lucius1 May 2026 9:56 UTC

26 points

1 comment13 min readLW link

(outpaced.substack.com)

Juriscription: finding the medicines missing somewhere

technicalities1 May 2026 9:55 UTC

29 points

1 comment2 min readLW link

Self driving interview

KatjaGrace1 May 2026 8:30 UTC

16 points

0 comments4 min readLW link

(worldspiritsockpuppet.com)

11 ways to be less deferential

KatjaGrace1 May 2026 8:00 UTC

22 points

3 comments2 min readLW link

(worldspiritsockpuppet.com)

Sanity-checking “Incompressible Knowledge Probes”

Sturb and LawrenceC

1 May 2026 6:52 UTC

60 points

12 comments16 min readLW link

Inkhaven, the 548th metapost

Sean Herrington1 May 2026 6:49 UTC

1 point

1 comment3 min readLW link

Automating Interpretability with Agents

Jack Payne, ksena and Georg Lange

1 May 2026 2:59 UTC

10 points

0 comments10 min readLW link

Against In-Duct UV

jefftk1 May 2026 2:40 UTC

23 points

0 comments3 min readLW link

(www.jefftk.com)

Winners of the Manifund Essay Prize

Austin Chen1 May 2026 2:21 UTC

6 points

0 comments11 min readLW link

(manifund.substack.com)

Reflections on InkHaven

David Scott Krueger1 May 2026 0:50 UTC

10 points

0 comments2 min readLW link

(therealartificialintelligence.substack.com)