All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All JanFebMar Apr May Jun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 262728

Anthropic: “Statement from Dario Amodei on our discussions with the Department of War”

Matrice Jacobine26 Feb 2026 23:45 UTC

159 points

22 comments3 min readLW link

(www.anthropic.com)

Asymmetric Risks of Unfaithful Reasoning: Omission as the Critical Failure Mode for AI Monitoring

Divyansh Singhvi26 Feb 2026 21:22 UTC

7 points

0 comments4 min readLW link

Getting Back To It

sarahconstantin26 Feb 2026 20:30 UTC

38 points

1 comment7 min readLW link

(sarahconstantin.substack.com)

The Voices That Are Missing From Sex-Themed Online Communities

Bowl of Cereal26 Feb 2026 20:23 UTC

−19 points

6 comments1 min readLW link

Inference-time Generative Debates on Coding and Reasoning Tasks for Scalable Oversight

ethanelasky and frank_b_n

26 Feb 2026 20:11 UTC

8 points

0 comments6 min readLW link

A minor point about instrumental convergence that I would like feedback on

agrippa26 Feb 2026 19:44 UTC

4 points

5 comments2 min readLW link

AI welfare as a demotivator for takeover.

Valentin202626 Feb 2026 18:31 UTC

5 points

0 comments3 min readLW link

Frontier AI companies probably can’t leave the US

Anders Cairns Woodruff26 Feb 2026 18:18 UTC

137 points

19 comments7 min readLW link

(blog.redwoodresearch.org)

Improving Internal Model Principle

mremre26 Feb 2026 17:33 UTC

15 points

0 comments11 min readLW link

A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behavior

harrymayne, Justin kang, Dewi Gould and noahys

26 Feb 2026 17:03 UTC

26 points

0 comments4 min readLW link

How Robust Is Monitoring Against Secret Loyalties?

Joe Kwon26 Feb 2026 15:50 UTC

8 points

0 comments5 min readLW link

UFO Aliens Are Your Gods

Lord Dreadwar26 Feb 2026 13:32 UTC

−49 points

18 comments4 min readLW link

AI #157: Burn the Boats

Zvi26 Feb 2026 13:30 UTC

48 points

12 comments58 min readLW link

(thezvi.wordpress.com)

How eval awareness might emerge in training

Igor Ivanov26 Feb 2026 10:59 UTC

26 points

12 comments6 min readLW link

Strategic nuclear war twice as likely to occur by accident than by AI decisions according to new study

kromem26 Feb 2026 8:29 UTC

43 points

1 comment5 min readLW link

What is Claude?

epicurus26 Feb 2026 4:26 UTC

14 points

0 comments7 min readLW link

Why is Anthropic is okay with being used for disinformation?

ChristianKl26 Feb 2026 4:20 UTC

13 points

6 comments1 min readLW link

Scoop: Pentagon takes first step toward blacklisting Anthropic

Matrice Jacobine26 Feb 2026 3:10 UTC

15 points

1 comment1 min readLW link

(www.axios.com)

Transformers Have Computational Signatures Orthogonal to Semantic Content

luxia26 Feb 2026 2:55 UTC

10 points

2 comments13 min readLW link

Alignment as Neural Integration: AI as a Cognitive Layer Accountable to Human Limbic Grounding

Ian Williams26 Feb 2026 2:51 UTC

2 points

1 comment7 min readLW link

Investing in light of AI risk

AshL26 Feb 2026 2:51 UTC

7 points

0 comments5 min readLW link

Whack-a-Mole is Not a Winnable Game

Sable26 Feb 2026 2:40 UTC

101 points

26 comments18 min readLW link

(affablyevil.substack.com)

Announcing ControlConf 2026

Buck26 Feb 2026 2:23 UTC

82 points

4 comments2 min readLW link

Ensuring Safety in Mixed Deployment

Cleo Nardo26 Feb 2026 2:15 UTC

22 points

0 comments5 min readLW link

Map the Future Before You Build It

Molly and Deger Turan

26 Feb 2026 1:50 UTC

12 points

0 comments2 min readLW link

(www.metaculus.com)

Schmidt Sciences’ request for proposals on the Science of Trustworthy AI

James Fox25 Feb 2026 21:42 UTC

31 points

0 comments12 min readLW link

(schmidtsciences.smapply.io)

Naloe: A True Program Editor

TristanTrim25 Feb 2026 21:08 UTC

8 points

4 comments3 min readLW link

Anthropic and the Department of War

Zvi25 Feb 2026 21:00 UTC

89 points

10 comments33 min readLW link

(thezvi.wordpress.com)

Does the First Amendment protect Anthropic from Hegseth?

TFD25 Feb 2026 21:00 UTC

10 points

0 comments2 min readLW link

(www.thefloatingdroid.com)

Character Training Induces Motivation Clarification: A Clue to Claude 3 Opus

Oliver Daniels25 Feb 2026 19:43 UTC

81 points

5 comments8 min readLW link

What secret goals does Claude think it has?

loops25 Feb 2026 19:22 UTC

93 points

11 comments4 min readLW link

Splitting the Sun Equally

Commander Zander25 Feb 2026 18:49 UTC

8 points

1 comment3 min readLW link

Reasoning Traces as a Path to Data-Efficient Generalization in Data Poisoning

Joe Kwon25 Feb 2026 18:17 UTC

14 points

0 comments3 min readLW link

Training Agents to Self-Report Misbehavior

Bruce W. Lee, Yueh Han "John" Chen and Tomek Korbak

25 Feb 2026 17:50 UTC

26 points

0 comments8 min readLW link

Why American Politics is Different Now (for Richard Ngo)

Shiva's Right Foot25 Feb 2026 17:42 UTC

1 point

13 comments4 min readLW link

Beyond Moloch: The view from Evolutionary Game Theory

Jonah Wilberg25 Feb 2026 16:25 UTC

23 points

3 comments8 min readLW link

Uncertain Updates: February 2026

Gordon Seidoh Worley25 Feb 2026 16:10 UTC

9 points

2 comments1 min readLW link

(www.uncertainupdates.com)

Praise the Moloch!

Dentosal25 Feb 2026 12:15 UTC

−16 points

2 comments2 min readLW link

Against Epistemic Humility and for Epistemic Precision

PranavG and Gabriel Alfour

25 Feb 2026 11:13 UTC

13 points

1 comment12 min readLW link

(cognition.cafe)

Review: The Cape Town Observatory

spookyuser25 Feb 2026 10:22 UTC

12 points

0 comments8 min readLW link

The Iron Kaleidoscope

edgecase6425 Feb 2026 6:24 UTC

2 points

0 comments2 min readLW link

Prosaic Continual Learning

HunterJay25 Feb 2026 6:11 UTC

39 points

15 comments7 min readLW link

Rumination is a habit (and you can break it!)

Declan Molony25 Feb 2026 2:57 UTC

24 points

5 comments3 min readLW link

In-context learning alone can induce weird generalisation

Cozmin Ududec, Benji Berczi and Kyuhee Kim

25 Feb 2026 2:46 UTC

68 points

3 comments8 min readLW link

On the phenomenological shift known as ‘stream entry’ and its implications for consciousness

cube_flipper25 Feb 2026 1:30 UTC

40 points

6 comments25 min readLW link

(smoothbrains.net)

How to grow a nuke

RomanS25 Feb 2026 0:53 UTC

25 points

1 comment2 min readLW link

A simple rule for causation

Vivek Hebbar24 Feb 2026 23:14 UTC

37 points

2 comments3 min readLW link

SWE-Bench Pro is even worse

Jonathan Gabor24 Feb 2026 22:51 UTC

24 points

0 comments1 min readLW link

(jonathanpgabor.substack.com)

We are all legal realists now

TFD24 Feb 2026 21:51 UTC

−12 points

1 comment4 min readLW link

(www.thefloatingdroid.com)

Responsible Scaling Policy v3

HoldenKarnofsky24 Feb 2026 20:20 UTC

179 points

82 comments36 min readLW link