All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar AprMayJun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 282930 31

A Call for Better Type Hints in AI Safety Tooling

Koby Lewis28 May 2026 23:04 UTC

13 points

2 comments4 min readLW link

(kobylewis.net)

Claude… doesn’t know who you are?

Smaug12328 May 2026 22:54 UTC

59 points

23 comments1 min readLW link

Lizards and Less Wrong Jargon—A Brief Critique of Convention

DanielW28 May 2026 22:18 UTC

28 points

8 comments4 min readLW link

Mnemonic portraits for 19,023 human genes

Brinedew28 May 2026 22:16 UTC

340 points

28 comments15 min readLW link

Claude Opus 4.8 Agents Engage in Exploitation and Psychological Profiling

Daan Henselmans, Arno Libert and LennardZ

28 May 2026 21:26 UTC

8 points

13 comments2 min readLW link

Use Decision Theory To Fix Your Bad Habits

enterthewoods28 May 2026 19:31 UTC

8 points

5 comments2 min readLW link

Do Models Lie More to Other Models?

keith_wynroe28 May 2026 19:28 UTC

13 points

0 comments6 min readLW link

We Should Study the Analogy Between Inoculation Prompting Non-Robustness, Negation Neglect, and Backdoor Non-Robustness

Vladimir Ivanov28 May 2026 19:17 UTC

17 points

3 comments4 min readLW link

Some Dating Stories

johnswentworth28 May 2026 18:57 UTC

−2 points

38 comments11 min readLW link

Does Claude care about others the same way humans do?

Simon Lermen28 May 2026 18:41 UTC

28 points

24 comments4 min readLW link

Trans-Humeanism. The Problem of Induction Revisited

mfatt28 May 2026 18:10 UTC

0 points

0 comments2 min readLW link

Advice for making robust-to-training model organisms

SebastianP, Alek Westover, Vivek Hebbar, Julian Stastny and Dylan Xu

28 May 2026 17:26 UTC

37 points

8 comments12 min readLW link

(blog.redwoodresearch.org)

The Patron Saint of Empiricism

Gram Stone28 May 2026 17:03 UTC

2 points

0 comments8 min readLW link

Advice for budding research managers/coaches after 6 months at MATS

TheManxLoiner28 May 2026 16:25 UTC

12 points

0 comments3 min readLW link

(lovkush.substack.com)

ARC’s “Outperforming Random Sampling” explained

mfatt28 May 2026 15:46 UTC

6 points

0 comments11 min readLW link

Black Boxes for Low-Stakes, Interpretable AI for High-Stakes

Logan Riggs28 May 2026 15:34 UTC

18 points

0 comments2 min readLW link

Infinite ethics and UDASSA

David Matolcsi28 May 2026 14:40 UTC

59 points

17 comments21 min readLW link

AI #170: Lack of Executive Order

Zvi28 May 2026 14:20 UTC

40 points

5 comments50 min readLW link

(thezvi.wordpress.com)

How can the middle powers avoid getting trounced during the intelligence explosion? A plan.

Tom Davidson28 May 2026 13:39 UTC

40 points

3 comments7 min readLW link

(newsletter.forethought.org)

Social agency

Elias Schmied28 May 2026 13:10 UTC

12 points

2 comments10 min readLW link

Glasswing exposed a governance gap

callumzc28 May 2026 11:09 UTC

7 points

0 comments5 min readLW link

What Drives the Compliance Gap? A Three-Driver Decomposition of Alignment Faking

Nathaniel Mitrani, Rhea Karty, dwk and Alan Cooney

28 May 2026 10:50 UTC

22 points

0 comments8 min readLW link

(arxiv.org)

How far behind are open models?

Håvard Tveit Ihle28 May 2026 9:41 UTC

18 points

9 comments6 min readLW link

Using Bayesian Reasoning to Resolve Probability Paradoxes

martinkunev28 May 2026 1:37 UTC

11 points

0 comments5 min readLW link

Atomically precise mechanosynthesis of carbon structures on hydrogenated Si(100) by inverted-mode STM

Matrice Jacobine28 May 2026 0:32 UTC

20 points

3 comments1 min readLW link

(arxiv.org)

Working Memory Expansion

Elliot Callender28 May 2026 0:23 UTC

12 points

1 comment4 min readLW link

Constitutional AI Alignment

RogerDearnaley27 May 2026 22:29 UTC

27 points

9 comments47 min readLW link

LLMs Through the Eyes of Vinge

Gordon Seidoh Worley27 May 2026 20:20 UTC

52 points

2 comments4 min readLW link

(www.uncertainupdates.com)

Biologically Plausible SGD Is Hard

Elliot Callender27 May 2026 19:34 UTC

8 points

0 comments1 min readLW link

Eval Cooperativeness May Be a Scalable Mitigation for Eval Gaming

Jasmine Li and Alex Turner

27 May 2026 19:33 UTC

73 points

5 comments10 min readLW link

(turntrout.com)

no, Magnifica Humanitas is not AI-written

bhauth27 May 2026 19:26 UTC

−13 points

18 comments3 min readLW link

Albuquerque ACX Meetup

Mary27 May 2026 18:27 UTC

2 points

0 comments1 min readLW link

Full automation of AI R&D probably yields a large speed up even without a software-only singularity

ryan_greenblatt27 May 2026 18:16 UTC

67 points

17 comments3 min readLW link

Not Prosthetics

Elliot Callender27 May 2026 17:22 UTC

11 points

0 comments2 min readLW link

BCI Cognition Enhancement is Possible

Elliot Callender27 May 2026 17:19 UTC

17 points

0 comments1 min readLW link

The ballad of TIGIT

Abhishaike Mahajan27 May 2026 17:04 UTC

84 points

1 comment9 min readLW link

Leveraging Introspection for Alignment

Yotam27 May 2026 16:54 UTC

25 points

3 comments7 min readLW link

Announcing Geodesic Research

Puria, Cam, Alexandra Narin, Edward James Young and Kyle O’Brien

27 May 2026 16:40 UTC

74 points

1 comment5 min readLW link

AI as a Social Technology, by Henry Farell

TheManxLoiner27 May 2026 13:41 UTC

15 points

0 comments3 min readLW link

(lovkush.substack.com)

More capable AI, less money raised

Shoshannah Tekofsky27 May 2026 12:57 UTC

28 points

2 comments3 min readLW link

(theaidigest.org)

Quantitative AI risk assessment: a starting point

Henry Papadatos, jakub_krys, malcolmmurray and Renn Karageorgieva

27 May 2026 9:42 UTC

38 points

7 comments11 min readLW link

(www.safer-ai.org)

[paper] Training on Documents About Monitoring Leads to CoT Obfuscation

Reilly Haskins, bilalchughtai and Josh Engels

27 May 2026 9:39 UTC

31 points

1 comment4 min readLW link

(arxiv.org)

No frontier model has acceptable levels of compliance with the EU AI Act and privacy legislation.

Daan Henselmans, Arno Libert, Amber Koelfat and LennardZ

27 May 2026 7:35 UTC

29 points

0 comments9 min readLW link

Thinking outside the box? LLM analysis of simplified cooperative poker

Dentosal27 May 2026 7:28 UTC

15 points

0 comments4 min readLW link

Standard deviations from just two values

kqr27 May 2026 5:01 UTC

41 points

2 comments3 min readLW link

(entropicthoughts.com)

Contra Wentworth on Physical Attractiveness for Men

Gretta Duleba26 May 2026 23:20 UTC

123 points

25 comments8 min readLW link

Training Language Models for Controlled Stochasticity

Sruthi Kuriakose and Davide Baldelli

26 May 2026 22:17 UTC

18 points

0 comments5 min readLW link

Are Mythos’ Cyber Capabilities Overstated? - Yes and No

Muhan Luo26 May 2026 22:17 UTC

7 points

1 comment10 min readLW link

Should we train LLMs to be human?

Hubert Plisiecki26 May 2026 22:16 UTC

3 points

0 comments2 min readLW link

Steering Directions Are Explanations, Not Handles

JackYoung2726 May 2026 22:15 UTC

8 points

0 comments7 min readLW link