All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar AprMayJun

All 1 2 3 4 5 6 7 8910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

We Should Have Mandatory Media/Communications Training For All Communicators

Darren McKee8 May 2026 20:29 UTC

2 points

6 comments3 min readLW link

Chess as a prediction model of the artificial intelligence impact on culture

8498 May 2026 20:19 UTC

−12 points

1 comment5 min readLW link

(lojkine.art)

The Saturation View: some responses

wdmacaskill8 May 2026 17:32 UTC

25 points

6 comments8 min readLW link

Is ProgramBench Impossible?

frmsaul8 May 2026 17:04 UTC

83 points

11 comments2 min readLW link

Claude Code, Codex and Agentic Coding #8

Zvi8 May 2026 16:40 UTC

45 points

1 comment11 min readLW link

(thezvi.wordpress.com)

AI is Breaking Two Vulnerability Cultures

jefftk8 May 2026 15:50 UTC

78 points

0 comments2 min readLW link

(www.jefftk.com)

Please Be Serious

Oliver Kuperman8 May 2026 14:36 UTC

−11 points

15 comments2 min readLW link

Write Cause You Have Something to Say

Logan Riggs8 May 2026 13:36 UTC

37 points

5 comments2 min readLW link

Userland Alignment

Josh H8 May 2026 13:31 UTC

4 points

0 comments2 min readLW link

A benchmark is a sensor

Håvard Tveit Ihle and Mathias Bynke

8 May 2026 13:24 UTC

36 points

4 comments3 min readLW link

Bringing More Expertise to Bear on Alignment

Edmund Lau, Geoffrey Irving, Cameron Holmes and David Africa

8 May 2026 10:29 UTC

87 points

1 comment8 min readLW link

The Jailbroken Boy of Rushmore

jdcampolargo8 May 2026 6:29 UTC

24 points

0 comments10 min readLW link

Investigating the consequences of accidentally grading CoT during RL

papetoast8 May 2026 6:17 UTC

24 points

0 comments1 min readLW link

(alignment.openai.com)

Uncertain Updates: May 2026

Gordon Seidoh Worley8 May 2026 1:20 UTC

14 points

2 comments1 min readLW link

(www.uncertainupdates.com)

The Frictionless Double

zw57 May 2026 23:11 UTC

10 points

4 comments8 min readLW link

The AI industry is where banking was in 2006. (We’re hiring)

felixgaston7 May 2026 21:52 UTC

53 points

1 comment2 min readLW link

(forum.effectivealtruism.org)

Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations

Subhash Kantamneni, kitft, Euan Ong and Sam Marks

7 May 2026 20:21 UTC

213 points

35 comments8 min readLW link

Axes of Planning in LLMs + Partial Lit Review

NickyP7 May 2026 19:53 UTC

12 points

0 comments9 min readLW link

(blog.sus.cat)

A review of “Investigating the consequences of accidentally grading CoT during RL”

Buck7 May 2026 18:06 UTC

76 points

1 comment8 min readLW link

Try, even if they have you cold

WalterL7 May 2026 17:19 UTC

102 points

14 comments2 min readLW link

Mechanistic estimation for wide random MLPs

Jacob_Hilton7 May 2026 16:20 UTC

85 points

5 comments5 min readLW link

(www.alignment.org)

Over Eight Months of Progress in Two: Analyzing the Mythos Preview Capability Jump

Alvin Ånestrand7 May 2026 16:19 UTC

10 points

8 comments17 min readLW link

(forecastingaifutures.substack.com)

AI #167: The Prior Restraint Era Begins

Zvi7 May 2026 13:50 UTC

39 points

7 comments45 min readLW link

(thezvi.wordpress.com)

How to get better at chess (and everything else)

Sean Herrington7 May 2026 11:17 UTC

11 points

0 comments3 min readLW link

(www.chess.com)

Multipolar Civilisation Depends on Maintaining an Attacker’s Dilemma

Naci Cankaya7 May 2026 11:13 UTC

27 points

1 comment5 min readLW link

(nacicankaya.substack.com)

Sculpted Interaction: a Design-First Approach to AI Alignment

magfrump6 May 2026 23:47 UTC

15 points

0 comments7 min readLW link

Psychopathy: The Choice

Dawn Drescher6 May 2026 22:23 UTC

22 points

0 comments17 min readLW link

(impartial-priorities.org)

Many individual CEVs are probably quite bad

Viliam6 May 2026 20:18 UTC

109 points

32 comments3 min readLW link

Blind deep-deployment evals for control & sabotage

Dylan Bowman6 May 2026 19:54 UTC

27 points

0 comments2 min readLW link

Using Base-LCM to Monitor LLMs

Éloïse Benito-Rodriguez and NickyP

6 May 2026 19:28 UTC

−1 points

0 comments4 min readLW link

Agent Ontology: A Constraint-Based Approach

tamas.bartha6 May 2026 19:26 UTC

−9 points

0 comments9 min readLW link

Will Claude cause the next Covid?

Kate Delbeke6 May 2026 19:26 UTC

3 points

0 comments4 min readLW link

SVD on Weight Differences for Model Auditing

Mukesh R6 May 2026 19:26 UTC

14 points

0 comments7 min readLW link

Half an argument against the (rationalist’s) many worlds interpretation

Bill Jackson6 May 2026 19:22 UTC

1 point

0 comments3 min readLW link

(billjackson7.substack.com)

AI Safety HK: Social #1 + Reading Group #1

Schizoid Rentoid6 May 2026 19:21 UTC

2 points

0 comments1 min readLW link

AI Safety Hong Kong: Social #1 + Reading group #1

Schizoid Rentoid6 May 2026 19:21 UTC

2 points

0 comments1 min readLW link

Preliminary Evidence for Value Convergence in AI models

John Matrix6 May 2026 19:15 UTC

1 point

1 comment7 min readLW link

Drifting

Priyanka Bharadwaj6 May 2026 19:14 UTC

6 points

0 comments2 min readLW link

A draft honesty policy for credible communication with AI systems

Mia Taylor, Lukas Finnveden and Max Dalton

6 May 2026 18:50 UTC

3 points

0 comments13 min readLW link

(www.forethought.org)

x-risk-themed

kave6 May 2026 15:16 UTC

230 points

23 comments3 min readLW link

(kaverennedy.substack.com)

Monday AI Radar #24

Against Moloch6 May 2026 15:05 UTC

10 points

3 comments8 min readLW link

(againstmoloch.substack.com)

AI Safety at the Frontier: Paper Highlights of April 2026

gasteigerjo6 May 2026 13:58 UTC

18 points

1 comment10 min readLW link

What is Anthropic?

Zvi6 May 2026 13:30 UTC

65 points

4 comments10 min readLW link

(thezvi.wordpress.com)

There is no evidence you should reapply sunscreen every 2 hours.

Hide6 May 2026 9:19 UTC

85 points

14 comments9 min readLW link

(hidefromit.substack.com)

Building An Ancestor Simulation #2

Mira Kennard6 May 2026 8:21 UTC

5 points

0 comments5 min readLW link

Psychopathy: The Types

Dawn Drescher6 May 2026 7:35 UTC

1 point

0 comments10 min readLW link

(impartial-priorities.org)

Toward a Better Evaluations Ecosystem

Benjamin Arnav5 May 2026 22:29 UTC

24 points

0 comments5 min readLW link

Model Spec Midtraining: Improving How Alignment Training Generalizes

Chloe Li, Nevan Wichers, saraprice, Sam Marks and Jonathan Kutasov

5 May 2026 21:55 UTC

71 points

7 comments7 min readLW link

(alignment.anthropic.com)

Positive Feedback Only

Florian_Dietz5 May 2026 21:28 UTC

18 points

0 comments8 min readLW link

What if LLMs are mostly crystallized intelligence?

deep5 May 2026 20:50 UTC

45 points

10 comments9 min readLW link

(expectedsurprise.substack.com)