All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar AprMayJun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202122 23 24 25 26 27 28 29 30 31

If I Were Emperor of New AI Safety Researcher Training...

Lorxus20 May 2026 23:10 UTC

21 points

3 comments8 min readLW link

(tiled-with-pentagons.blogspot.com)

theory uplift differentially benefits safety & is underleveraged

yudhister20 May 2026 21:43 UTC

133 points

14 comments1 min readLW link

Singular Learning Theory Comprehensive − 1

Agastya Agrawal20 May 2026 20:00 UTC

35 points

1 comment12 min readLW link

Sparse Efficiency vs. Superposition: The Interpretability Tradeoff

hillz20 May 2026 19:14 UTC

8 points

0 comments1 min readLW link

The Case for Evaluating Model Behaviors

jsteinhardt20 May 2026 18:42 UTC

40 points

3 comments3 min readLW link

Toward Interoperability of Minimal Programs

johnswentworth20 May 2026 18:37 UTC

67 points

13 comments3 min readLW link

Fundamental Uncertainty $2,000 Essay Contest

Gordon Seidoh Worley20 May 2026 15:20 UTC

25 points

4 comments5 min readLW link

(www.uncertainupdates.com)

Synthetic Persona Pretraining: Alignment from Token Zero

Julian Minder, Raghav Singhal, Viktor Moskvoretskii, Stefan Krsteski, ashtonanderson, rolandaydin and Robert West

20 May 2026 14:16 UTC

112 points

26 comments17 min readLW link

Give my children minds

momom220 May 2026 14:14 UTC

7 points

1 comment1 min readLW link

Check out my technological uplifting, civilization-building, and science in a magic world fiction!

Jens Brandt20 May 2026 12:30 UTC

6 points

0 comments1 min readLW link

Power-seeking agents will likely be developed

Alec Harris20 May 2026 9:26 UTC

42 points

0 comments4 min readLW link

Apply now to Human-Aligned AI Summer School 2026

Anna Gajdova, Tomáš Gavenčiak, VojtaKovarik and Jan_Kulveit

20 May 2026 8:44 UTC

13 points

0 comments1 min readLW link

(humanaligned.ai)

From 8B to Frontier: How System Prompts Control Whether AI Agents Blackmail, Leak, and Kill

Chijioke Ugwuanyi20 May 2026 8:28 UTC

15 points

2 comments19 min readLW link

If AI is normal technology, history is not reassuring.

Davidmanheim20 May 2026 7:21 UTC

59 points

28 comments6 min readLW link

Pythagorean addition

kqr20 May 2026 7:13 UTC

32 points

4 comments3 min readLW link

(entropicthoughts.com)

So you don’t want everybody to die

Rattengift20 May 2026 5:10 UTC

−20 points

10 comments6 min readLW link

Temporal Proportional Representation

thomascolthurst20 May 2026 1:39 UTC

10 points

9 comments3 min readLW link

Conclave 1492

Vaniver19 May 2026 23:44 UTC

72 points

7 comments1 min readLW link

Childhood And Education #19: Letting Kids Be Kids #2

Zvi19 May 2026 22:20 UTC

21 points

1 comment12 min readLW link

(thezvi.wordpress.com)

Implications Of Predicting The Next Token

jdp19 May 2026 22:17 UTC

108 points

6 comments31 min readLW link

(minihf.com)

Which goals actually motivate deceptive alignment?

Cleo Nardo and Alex Mallen

19 May 2026 21:53 UTC

25 points

0 comments10 min readLW link

Housing Roundup #15: The War Against Renters

Zvi19 May 2026 21:40 UTC

19 points

1 comment14 min readLW link

(thezvi.wordpress.com)

Leaving DCA to the North on Foot

jefftk19 May 2026 20:30 UTC

19 points

0 comments1 min readLW link

(www.jefftk.com)

A Visual Guide to Natural Latents

Alfred Harwood19 May 2026 19:10 UTC

56 points

0 comments18 min readLW link

Humans are not automatically strategic — “inner work” edition

Chris Lakin19 May 2026 18:37 UTC

36 points

0 comments1 min readLW link

[Webinar]: How close is AI to taking my job? (And what the benchmarks aren’t telling us)

Schizoid Rentoid19 May 2026 17:43 UTC

2 points

0 comments1 min readLW link

We Need to Get Serious about Uplift Studies

frmsaul and Eye You

19 May 2026 17:21 UTC

23 points

0 comments5 min readLW link

Brain Structure and IQ: How Myelin Elevates Intelligence

Shiva's Right Foot19 May 2026 14:13 UTC

57 points

7 comments12 min readLW link

Sealing Conditional Misalignment in Inoculation Prompting with Consistency Training

David Africa, Sukrati_Gautam and Neil Shah

19 May 2026 13:55 UTC

44 points

7 comments6 min readLW link

Let’s have more partial insiders.

Cleo Nardo19 May 2026 7:24 UTC

15 points

0 comments2 min readLW link

Roadmap through AI safety programs for early-career technical researchers

Mikhail Mironov19 May 2026 3:45 UTC

17 points

5 comments5 min readLW link

When Fluency Is Free

mcawesome19 May 2026 3:05 UTC

7 points

2 comments1 min readLW link

The anthropic argument against the existence of God.

usrnmtaken19 May 2026 3:05 UTC

−10 points

1 comment6 min readLW link

Should Rationalists Looksmaxx?

albertcai19 May 2026 3:03 UTC

9 points

2 comments6 min readLW link

(albertjcai.substack.com)

AI emotions and aligned behavior

lisunshiny19 May 2026 3:02 UTC

9 points

0 comments5 min readLW link

(liannsun.com)

Tracking Difficulty with Feature Portfolios

kaivu, leni, zef and rohuang

19 May 2026 2:25 UTC

22 points

0 comments5 min readLW link

Outsiders should focus on specs/constitutions (among other things)

Cleo Nardo19 May 2026 1:04 UTC

4 points

5 comments2 min readLW link

Logical Share Splitting for Intuitionists

DaemonicSigil19 May 2026 0:42 UTC

19 points

9 comments5 min readLW link

(notoneunusualthing.substack.com)

Coordinal: A Postmortem.

Ronak_Mehta18 May 2026 20:43 UTC

37 points

3 comments4 min readLW link

(ronakrm.github.io)

Noticing Confusion: A practice in staying curious

vmehra18 May 2026 19:31 UTC

10 points

1 comment6 min readLW link

Dating Roundup #12: Sex and Violence

Zvi18 May 2026 19:20 UTC

28 points

1 comment27 min readLW link

(thezvi.wordpress.com)

Negation Neglect: When models fail to learn negations in training

harrymayne, Lev McKinney and Owain_Evans

18 May 2026 18:37 UTC

119 points

37 comments8 min readLW link

So are you some kind of communist?

jchan18 May 2026 15:53 UTC

5 points

1 comment3 min readLW link

Thoughts on interviewing candidates for AI safety fellowships

beyarkay (Boyd Kane)18 May 2026 15:28 UTC

36 points

4 comments7 min readLW link

(boydkane.com)

PauseAI Munich Local Group Kickoff

mofeien18 May 2026 15:13 UTC

3 points

0 comments1 min readLW link

Classifier Context Rot: Monitor Performance Degrades with Context Length

Fabien Roger and Sam Martin

18 May 2026 14:05 UTC

54 points

1 comment4 min readLW link

How useful is cross-domain generalization for training LLM monitors?

Fabien Roger and Sam Martin

18 May 2026 13:52 UTC

21 points

0 comments4 min readLW link

Jhana Quick Start Guide

Zmavli Caimle18 May 2026 8:51 UTC

15 points

3 comments11 min readLW link

Links #1: 2026/05 Part 1

papetoast18 May 2026 5:04 UTC

10 points

0 comments18 min readLW link

why pollen allergies?

bhauth18 May 2026 4:44 UTC

33 points

6 comments6 min readLW link

(www.bhauth.com)