Co­her­ent Care

abramdemski27 Feb 2026 21:59 UTC
41 points
2 comments16 min readLW link

The tick in my back

benjamin ar27 Feb 2026 21:49 UTC
12 points
0 comments4 min readLW link
(bjar.substack.com)

Side by Side Com­par­i­son of RSP Versions

Corm27 Feb 2026 21:11 UTC
18 points
0 comments1 min readLW link

An­thropic and the DoW: An­thropic Responds

Zvi27 Feb 2026 20:50 UTC
56 points
3 comments25 min readLW link
(thezvi.wordpress.com)

Ball+Grav­ity has a “Down­hill” Preference

TristanTrim27 Feb 2026 19:12 UTC
8 points
0 comments2 min readLW link

Safe ASI Is Achiev­able: The Finite Game Argument

Lester Leong27 Feb 2026 18:50 UTC
9 points
7 comments22 min readLW link

[Question] Best short in­tro­duc­tions to AI safety & al­ign­ment for bright col­lege stu­dents?

geoffreymiller27 Feb 2026 18:04 UTC
7 points
0 comments1 min readLW link

New ARENA ma­te­rial: 8 ex­er­cise sets on al­ign­ment sci­ence & interpretability

CallumMcDougall27 Feb 2026 17:37 UTC
104 points
1 comment7 min readLW link

3 Challenges and 2 Hopes for the Safety of Un­su­per­vised Elicitation

27 Feb 2026 17:25 UTC
21 points
0 comments10 min readLW link

The Dawn of AI Scheming

Alvin Ånestrand27 Feb 2026 17:24 UTC
19 points
0 comments59 min readLW link
(forecastingaifutures.substack.com)

Sam Alt­man says OpenAI shares An­thropic’s red lines in Pen­tagon fight

Matrice Jacobine27 Feb 2026 15:42 UTC
77 points
14 comments3 min readLW link
(www.axios.com)

AI Se­cu­rity Boot­camp Sin­ga­pore—Call for Applications

27 Feb 2026 13:34 UTC
5 points
0 comments3 min readLW link

What I Got From 1.5 Years In Slightly-Com­pet­i­tive Debate

CarolusRenniusVitellius27 Feb 2026 5:37 UTC
23 points
6 comments8 min readLW link
(charlesr-w.github.io)

Here’s to the Polypropy­lene Makers

jefftk27 Feb 2026 4:00 UTC
554 points
19 comments2 min readLW link
(www.jefftk.com)

Why Did My Model Do That? Model In­crim­i­na­tion for Di­ag­nos­ing LLM Misbehavior

27 Feb 2026 3:20 UTC
60 points
12 comments78 min readLW link

Vibe Cod­ing is a Sys­tem De­sign Interview

Brendan Long27 Feb 2026 0:16 UTC
25 points
5 comments1 min readLW link
(www.brendanlong.com)

An­thropic: “State­ment from Dario Amodei on our dis­cus­sions with the Depart­ment of War”

Matrice Jacobine26 Feb 2026 23:45 UTC
159 points
22 comments3 min readLW link
(www.anthropic.com)

Asym­met­ric Risks of Un­faith­ful Rea­son­ing: Omis­sion as the Crit­i­cal Failure Mode for AI Monitoring

Divyansh Singhvi26 Feb 2026 21:22 UTC
7 points
0 comments4 min readLW link

Get­ting Back To It

sarahconstantin26 Feb 2026 20:30 UTC
38 points
1 comment7 min readLW link
(sarahconstantin.substack.com)

The Voices That Are Miss­ing From Sex-Themed On­line Communities

Bowl of Cereal26 Feb 2026 20:23 UTC
−19 points
6 comments1 min readLW link

In­fer­ence-time Gen­er­a­tive De­bates on Cod­ing and Rea­son­ing Tasks for Scal­able Oversight

26 Feb 2026 20:11 UTC
8 points
0 comments6 min readLW link

A minor point about in­stru­men­tal con­ver­gence that I would like feed­back on

agrippa26 Feb 2026 19:44 UTC
4 points
5 comments2 min readLW link

AI welfare as a de­mo­ti­va­tor for takeover.

Valentin202626 Feb 2026 18:31 UTC
5 points
0 comments3 min readLW link

Fron­tier AI com­pa­nies prob­a­bly can’t leave the US

Anders Cairns Woodruff26 Feb 2026 18:18 UTC
137 points
19 comments7 min readLW link
(blog.redwoodresearch.org)

Im­prov­ing In­ter­nal Model Principle

mremre26 Feb 2026 17:33 UTC
15 points
0 comments11 min readLW link

A Pos­i­tive Case for Faith­ful­ness: LLM Self-Ex­pla­na­tions Help Pre­dict Model Behavior

26 Feb 2026 17:03 UTC
26 points
0 comments4 min readLW link

How Ro­bust Is Mon­i­tor­ing Against Se­cret Loy­alties?

Joe Kwon26 Feb 2026 15:50 UTC
8 points
0 comments5 min readLW link

UFO Aliens Are Your Gods

Lord Dreadwar26 Feb 2026 13:32 UTC
−49 points
18 comments4 min readLW link

AI #157: Burn the Boats

Zvi26 Feb 2026 13:30 UTC
48 points
12 comments58 min readLW link
(thezvi.wordpress.com)

How eval aware­ness might emerge in training

Igor Ivanov26 Feb 2026 10:59 UTC
26 points
12 comments6 min readLW link

Strate­gic nu­clear war twice as likely to oc­cur by ac­ci­dent than by AI de­ci­sions ac­cord­ing to new study

kromem26 Feb 2026 8:29 UTC
43 points
1 comment5 min readLW link

What is Claude?

epicurus26 Feb 2026 4:26 UTC
14 points
0 comments7 min readLW link

Why is An­thropic is okay with be­ing used for dis­in­for­ma­tion?

ChristianKl26 Feb 2026 4:20 UTC
13 points
6 comments1 min readLW link

Scoop: Pen­tagon takes first step to­ward black­list­ing Anthropic

Matrice Jacobine26 Feb 2026 3:10 UTC
15 points
1 comment1 min readLW link
(www.axios.com)

Trans­form­ers Have Com­pu­ta­tional Sig­na­tures Orthog­o­nal to Se­man­tic Content

luxia26 Feb 2026 2:55 UTC
10 points
2 comments13 min readLW link

Align­ment as Neu­ral In­te­gra­tion: AI as a Cog­ni­tive Layer Ac­countable to Hu­man Lim­bic Grounding

Ian Williams26 Feb 2026 2:51 UTC
2 points
1 comment7 min readLW link

In­vest­ing in light of AI risk

AshL26 Feb 2026 2:51 UTC
7 points
0 comments5 min readLW link

Whack-a-Mole is Not a Winnable Game

Sable26 Feb 2026 2:40 UTC
101 points
26 comments18 min readLW link
(affablyevil.substack.com)

An­nounc­ing Con­trolConf 2026

Buck26 Feb 2026 2:23 UTC
82 points
4 comments2 min readLW link

En­sur­ing Safety in Mixed Deployment

Cleo Nardo26 Feb 2026 2:15 UTC
22 points
0 comments5 min readLW link

Map the Fu­ture Be­fore You Build It

26 Feb 2026 1:50 UTC
12 points
0 comments2 min readLW link
(www.metaculus.com)

Sch­midt Sciences’ re­quest for pro­pos­als on the Science of Trust­wor­thy AI

James Fox25 Feb 2026 21:42 UTC
31 points
0 comments12 min readLW link
(schmidtsciences.smapply.io)

Naloe: A True Pro­gram Editor

TristanTrim25 Feb 2026 21:08 UTC
8 points
4 comments3 min readLW link

An­thropic and the Depart­ment of War

Zvi25 Feb 2026 21:00 UTC
89 points
10 comments33 min readLW link
(thezvi.wordpress.com)

Does the First Amend­ment pro­tect An­thropic from Hegseth?

TFD25 Feb 2026 21:00 UTC
10 points
0 comments2 min readLW link
(www.thefloatingdroid.com)

Char­ac­ter Train­ing In­duces Mo­ti­va­tion Clar­ifi­ca­tion: A Clue to Claude 3 Opus

Oliver Daniels25 Feb 2026 19:43 UTC
81 points
5 comments8 min readLW link

What se­cret goals does Claude think it has?

loops25 Feb 2026 19:22 UTC
93 points
11 comments4 min readLW link

Split­ting the Sun Equally

Commander Zander25 Feb 2026 18:49 UTC
8 points
1 comment3 min readLW link

Rea­son­ing Traces as a Path to Data-Effi­cient Gen­er­al­iza­tion in Data Poisoning

Joe Kwon25 Feb 2026 18:17 UTC
14 points
0 comments3 min readLW link

Train­ing Agents to Self-Re­port Misbehavior

25 Feb 2026 17:50 UTC
26 points
0 comments8 min readLW link