All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All JanFebMar Apr May Jun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 2728

Coherent Care

abramdemski27 Feb 2026 21:59 UTC

41 points

2 comments16 min readLW link

The tick in my back

benjamin ar27 Feb 2026 21:49 UTC

12 points

0 comments4 min readLW link

(bjar.substack.com)

Side by Side Comparison of RSP Versions

Corm27 Feb 2026 21:11 UTC

18 points

0 comments1 min readLW link

Anthropic and the DoW: Anthropic Responds

Zvi27 Feb 2026 20:50 UTC

56 points

3 comments25 min readLW link

(thezvi.wordpress.com)

Ball+Gravity has a “Downhill” Preference

TristanTrim27 Feb 2026 19:12 UTC

8 points

0 comments2 min readLW link

Safe ASI Is Achievable: The Finite Game Argument

Lester Leong27 Feb 2026 18:50 UTC

9 points

7 comments22 min readLW link

[Question] Best short introductions to AI safety & alignment for bright college students?

geoffreymiller27 Feb 2026 18:04 UTC

7 points

0 comments1 min readLW link

New ARENA material: 8 exercise sets on alignment science & interpretability

CallumMcDougall27 Feb 2026 17:37 UTC

104 points

1 comment7 min readLW link

3 Challenges and 2 Hopes for the Safety of Unsupervised Elicitation

Callum Canavan, Aditya Shrivastava, Allison Qi, Jonathan Michala and Fabien Roger

27 Feb 2026 17:25 UTC

21 points

0 comments10 min readLW link

The Dawn of AI Scheming

Alvin Ånestrand27 Feb 2026 17:24 UTC

19 points

0 comments59 min readLW link

(forecastingaifutures.substack.com)

Sam Altman says OpenAI shares Anthropic’s red lines in Pentagon fight

Matrice Jacobine27 Feb 2026 15:42 UTC

77 points

14 comments3 min readLW link

(www.axios.com)

AI Security Bootcamp Singapore—Call for Applications

Pranav Gade and Red Bermejo

27 Feb 2026 13:34 UTC

5 points

0 comments3 min readLW link

What I Got From 1.5 Years In Slightly-Competitive Debate

CarolusRenniusVitellius27 Feb 2026 5:37 UTC

23 points

6 comments8 min readLW link

(charlesr-w.github.io)

Here’s to the Polypropylene Makers

jefftk27 Feb 2026 4:00 UTC

554 points

19 comments2 min readLW link

(www.jefftk.com)

Why Did My Model Do That? Model Incrimination for Diagnosing LLM Misbehavior

aditya singh, gersonkroiz, Senthooran Rajamanoharan and Neel Nanda

27 Feb 2026 3:20 UTC

60 points

12 comments78 min readLW link

Vibe Coding is a System Design Interview

Brendan Long27 Feb 2026 0:16 UTC

25 points

5 comments1 min readLW link

(www.brendanlong.com)

Anthropic: “Statement from Dario Amodei on our discussions with the Department of War”

Matrice Jacobine26 Feb 2026 23:45 UTC

159 points

22 comments3 min readLW link

(www.anthropic.com)

Asymmetric Risks of Unfaithful Reasoning: Omission as the Critical Failure Mode for AI Monitoring

Divyansh Singhvi26 Feb 2026 21:22 UTC

7 points

0 comments4 min readLW link

Getting Back To It

sarahconstantin26 Feb 2026 20:30 UTC

38 points

1 comment7 min readLW link

(sarahconstantin.substack.com)

The Voices That Are Missing From Sex-Themed Online Communities

Bowl of Cereal26 Feb 2026 20:23 UTC

−19 points

6 comments1 min readLW link

Inference-time Generative Debates on Coding and Reasoning Tasks for Scalable Oversight

ethanelasky and frank_b_n

26 Feb 2026 20:11 UTC

8 points

0 comments6 min readLW link

A minor point about instrumental convergence that I would like feedback on

agrippa26 Feb 2026 19:44 UTC

4 points

5 comments2 min readLW link

AI welfare as a demotivator for takeover.

Valentin202626 Feb 2026 18:31 UTC

5 points

0 comments3 min readLW link

Frontier AI companies probably can’t leave the US

Anders Cairns Woodruff26 Feb 2026 18:18 UTC

137 points

19 comments7 min readLW link

(blog.redwoodresearch.org)

Improving Internal Model Principle

mremre26 Feb 2026 17:33 UTC

15 points

0 comments11 min readLW link

A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behavior

harrymayne, Justin kang, Dewi Gould and noahys

26 Feb 2026 17:03 UTC

26 points

0 comments4 min readLW link

How Robust Is Monitoring Against Secret Loyalties?

Joe Kwon26 Feb 2026 15:50 UTC

8 points

0 comments5 min readLW link

UFO Aliens Are Your Gods

Lord Dreadwar26 Feb 2026 13:32 UTC

−49 points

18 comments4 min readLW link

AI #157: Burn the Boats

Zvi26 Feb 2026 13:30 UTC

48 points

12 comments58 min readLW link

(thezvi.wordpress.com)

How eval awareness might emerge in training

Igor Ivanov26 Feb 2026 10:59 UTC

26 points

12 comments6 min readLW link

Strategic nuclear war twice as likely to occur by accident than by AI decisions according to new study

kromem26 Feb 2026 8:29 UTC

43 points

1 comment5 min readLW link

What is Claude?

epicurus26 Feb 2026 4:26 UTC

14 points

0 comments7 min readLW link

Why is Anthropic is okay with being used for disinformation?

ChristianKl26 Feb 2026 4:20 UTC

13 points

6 comments1 min readLW link

Scoop: Pentagon takes first step toward blacklisting Anthropic

Matrice Jacobine26 Feb 2026 3:10 UTC

15 points

1 comment1 min readLW link

(www.axios.com)

Transformers Have Computational Signatures Orthogonal to Semantic Content

luxia26 Feb 2026 2:55 UTC

10 points

2 comments13 min readLW link

Alignment as Neural Integration: AI as a Cognitive Layer Accountable to Human Limbic Grounding

Ian Williams26 Feb 2026 2:51 UTC

2 points

1 comment7 min readLW link

Investing in light of AI risk

AshL26 Feb 2026 2:51 UTC

7 points

0 comments5 min readLW link

Whack-a-Mole is Not a Winnable Game

Sable26 Feb 2026 2:40 UTC

101 points

26 comments18 min readLW link

(affablyevil.substack.com)

Announcing ControlConf 2026

Buck26 Feb 2026 2:23 UTC

82 points

4 comments2 min readLW link

Ensuring Safety in Mixed Deployment

Cleo Nardo26 Feb 2026 2:15 UTC

22 points

0 comments5 min readLW link

Map the Future Before You Build It

Molly and Deger Turan

26 Feb 2026 1:50 UTC

12 points

0 comments2 min readLW link

(www.metaculus.com)

Schmidt Sciences’ request for proposals on the Science of Trustworthy AI

James Fox25 Feb 2026 21:42 UTC

31 points

0 comments12 min readLW link

(schmidtsciences.smapply.io)

Naloe: A True Program Editor

TristanTrim25 Feb 2026 21:08 UTC

8 points

4 comments3 min readLW link

Anthropic and the Department of War

Zvi25 Feb 2026 21:00 UTC

89 points

10 comments33 min readLW link

(thezvi.wordpress.com)

Does the First Amendment protect Anthropic from Hegseth?

TFD25 Feb 2026 21:00 UTC

10 points

0 comments2 min readLW link

(www.thefloatingdroid.com)

Character Training Induces Motivation Clarification: A Clue to Claude 3 Opus

Oliver Daniels25 Feb 2026 19:43 UTC

81 points

5 comments8 min readLW link

What secret goals does Claude think it has?

loops25 Feb 2026 19:22 UTC

93 points

11 comments4 min readLW link

Splitting the Sun Equally

Commander Zander25 Feb 2026 18:49 UTC

8 points

1 comment3 min readLW link

Reasoning Traces as a Path to Data-Efficient Generalization in Data Poisoning

Joe Kwon25 Feb 2026 18:17 UTC

14 points

0 comments3 min readLW link

Training Agents to Self-Report Misbehavior

Bruce W. Lee, Yueh Han "John" Chen and Tomek Korbak

25 Feb 2026 17:50 UTC

26 points

0 comments8 min readLW link