An­thropic: “State­ment from Dario Amodei on our dis­cus­sions with the Depart­ment of War”

Matrice Jacobine26 Feb 2026 23:45 UTC
159 points
22 comments3 min readLW link
(www.anthropic.com)

Asym­met­ric Risks of Un­faith­ful Rea­son­ing: Omis­sion as the Crit­i­cal Failure Mode for AI Monitoring

Divyansh Singhvi26 Feb 2026 21:22 UTC
7 points
0 comments4 min readLW link

Get­ting Back To It

sarahconstantin26 Feb 2026 20:30 UTC
38 points
1 comment7 min readLW link
(sarahconstantin.substack.com)

The Voices That Are Miss­ing From Sex-Themed On­line Communities

Bowl of Cereal26 Feb 2026 20:23 UTC
−19 points
6 comments1 min readLW link

In­fer­ence-time Gen­er­a­tive De­bates on Cod­ing and Rea­son­ing Tasks for Scal­able Oversight

26 Feb 2026 20:11 UTC
8 points
0 comments6 min readLW link

A minor point about in­stru­men­tal con­ver­gence that I would like feed­back on

agrippa26 Feb 2026 19:44 UTC
4 points
5 comments2 min readLW link

AI welfare as a de­mo­ti­va­tor for takeover.

Valentin202626 Feb 2026 18:31 UTC
5 points
0 comments3 min readLW link

Fron­tier AI com­pa­nies prob­a­bly can’t leave the US

Anders Cairns Woodruff26 Feb 2026 18:18 UTC
137 points
19 comments7 min readLW link
(blog.redwoodresearch.org)

Im­prov­ing In­ter­nal Model Principle

mremre26 Feb 2026 17:33 UTC
15 points
0 comments11 min readLW link

A Pos­i­tive Case for Faith­ful­ness: LLM Self-Ex­pla­na­tions Help Pre­dict Model Behavior

26 Feb 2026 17:03 UTC
26 points
0 comments4 min readLW link

How Ro­bust Is Mon­i­tor­ing Against Se­cret Loy­alties?

Joe Kwon26 Feb 2026 15:50 UTC
8 points
0 comments5 min readLW link

UFO Aliens Are Your Gods

Lord Dreadwar26 Feb 2026 13:32 UTC
−49 points
18 comments4 min readLW link

AI #157: Burn the Boats

Zvi26 Feb 2026 13:30 UTC
48 points
12 comments58 min readLW link
(thezvi.wordpress.com)

How eval aware­ness might emerge in training

Igor Ivanov26 Feb 2026 10:59 UTC
26 points
12 comments6 min readLW link

Strate­gic nu­clear war twice as likely to oc­cur by ac­ci­dent than by AI de­ci­sions ac­cord­ing to new study

kromem26 Feb 2026 8:29 UTC
43 points
1 comment5 min readLW link

What is Claude?

epicurus26 Feb 2026 4:26 UTC
14 points
0 comments7 min readLW link

Why is An­thropic is okay with be­ing used for dis­in­for­ma­tion?

ChristianKl26 Feb 2026 4:20 UTC
13 points
6 comments1 min readLW link

Scoop: Pen­tagon takes first step to­ward black­list­ing Anthropic

Matrice Jacobine26 Feb 2026 3:10 UTC
15 points
1 comment1 min readLW link
(www.axios.com)

Trans­form­ers Have Com­pu­ta­tional Sig­na­tures Orthog­o­nal to Se­man­tic Content

luxia26 Feb 2026 2:55 UTC
10 points
2 comments13 min readLW link

Align­ment as Neu­ral In­te­gra­tion: AI as a Cog­ni­tive Layer Ac­countable to Hu­man Lim­bic Grounding

Ian Williams26 Feb 2026 2:51 UTC
2 points
1 comment7 min readLW link

In­vest­ing in light of AI risk

AshL26 Feb 2026 2:51 UTC
7 points
0 comments5 min readLW link

Whack-a-Mole is Not a Winnable Game

Sable26 Feb 2026 2:40 UTC
101 points
26 comments18 min readLW link
(affablyevil.substack.com)

An­nounc­ing Con­trolConf 2026

Buck26 Feb 2026 2:23 UTC
82 points
4 comments2 min readLW link

En­sur­ing Safety in Mixed Deployment

Cleo Nardo26 Feb 2026 2:15 UTC
22 points
0 comments5 min readLW link

Map the Fu­ture Be­fore You Build It

26 Feb 2026 1:50 UTC
12 points
0 comments2 min readLW link
(www.metaculus.com)

Sch­midt Sciences’ re­quest for pro­pos­als on the Science of Trust­wor­thy AI

James Fox25 Feb 2026 21:42 UTC
31 points
0 comments12 min readLW link
(schmidtsciences.smapply.io)

Naloe: A True Pro­gram Editor

TristanTrim25 Feb 2026 21:08 UTC
8 points
4 comments3 min readLW link

An­thropic and the Depart­ment of War

Zvi25 Feb 2026 21:00 UTC
89 points
10 comments33 min readLW link
(thezvi.wordpress.com)

Does the First Amend­ment pro­tect An­thropic from Hegseth?

TFD25 Feb 2026 21:00 UTC
10 points
0 comments2 min readLW link
(www.thefloatingdroid.com)

Char­ac­ter Train­ing In­duces Mo­ti­va­tion Clar­ifi­ca­tion: A Clue to Claude 3 Opus

Oliver Daniels25 Feb 2026 19:43 UTC
81 points
5 comments8 min readLW link

What se­cret goals does Claude think it has?

loops25 Feb 2026 19:22 UTC
93 points
11 comments4 min readLW link

Split­ting the Sun Equally

Commander Zander25 Feb 2026 18:49 UTC
8 points
1 comment3 min readLW link

Rea­son­ing Traces as a Path to Data-Effi­cient Gen­er­al­iza­tion in Data Poisoning

Joe Kwon25 Feb 2026 18:17 UTC
14 points
0 comments3 min readLW link

Train­ing Agents to Self-Re­port Misbehavior

25 Feb 2026 17:50 UTC
26 points
0 comments8 min readLW link

Why Amer­i­can Poli­tics is Differ­ent Now (for Richard Ngo)

Shiva's Right Foot25 Feb 2026 17:42 UTC
1 point
13 comments4 min readLW link

Beyond Moloch: The view from Evolu­tion­ary Game Theory

Jonah Wilberg25 Feb 2026 16:25 UTC
23 points
3 comments8 min readLW link

Uncer­tain Up­dates: Fe­bru­ary 2026

Gordon Seidoh Worley25 Feb 2026 16:10 UTC
9 points
2 comments1 min readLW link
(www.uncertainupdates.com)

Praise the Moloch!

Dentosal25 Feb 2026 12:15 UTC
−16 points
2 comments2 min readLW link

Against Epistemic Hu­mil­ity and for Epistemic Precision

25 Feb 2026 11:13 UTC
13 points
1 comment12 min readLW link
(cognition.cafe)

Re­view: The Cape Town Observatory

spookyuser25 Feb 2026 10:22 UTC
12 points
0 comments8 min readLW link

The Iron Kaleidoscope

edgecase6425 Feb 2026 6:24 UTC
2 points
0 comments2 min readLW link

Pro­saic Con­tinual Learning

HunterJay25 Feb 2026 6:11 UTC
39 points
15 comments7 min readLW link

Ru­mi­na­tion is a habit (and you can break it!)

Declan Molony25 Feb 2026 2:57 UTC
24 points
5 comments3 min readLW link

In-con­text learn­ing alone can in­duce weird generalisation

25 Feb 2026 2:46 UTC
68 points
3 comments8 min readLW link

On the phe­nomenolog­i­cal shift known as ‘stream en­try’ and its im­pli­ca­tions for consciousness

cube_flipper25 Feb 2026 1:30 UTC
40 points
6 comments25 min readLW link
(smoothbrains.net)

How to grow a nuke

RomanS25 Feb 2026 0:53 UTC
25 points
1 comment2 min readLW link

A sim­ple rule for causation

Vivek Hebbar24 Feb 2026 23:14 UTC
37 points
2 comments3 min readLW link

SWE-Bench Pro is even worse

Jonathan Gabor24 Feb 2026 22:51 UTC
24 points
0 comments1 min readLW link
(jonathanpgabor.substack.com)

We are all le­gal re­al­ists now

TFD24 Feb 2026 21:51 UTC
−12 points
1 comment4 min readLW link
(www.thefloatingdroid.com)

Re­spon­si­ble Scal­ing Policy v3

HoldenKarnofsky24 Feb 2026 20:20 UTC
179 points
82 comments36 min readLW link