Sch­midt Sciences’ re­quest for pro­pos­als on the Science of Trust­wor­thy AI

James Fox25 Feb 2026 21:42 UTC
31 points
0 comments12 min readLW link
(schmidtsciences.smapply.io)

Naloe: A True Pro­gram Editor

TristanTrim25 Feb 2026 21:08 UTC
8 points
4 comments3 min readLW link

An­thropic and the Depart­ment of War

Zvi25 Feb 2026 21:00 UTC
89 points
10 comments33 min readLW link
(thezvi.wordpress.com)

Does the First Amend­ment pro­tect An­thropic from Hegseth?

TFD25 Feb 2026 21:00 UTC
10 points
0 comments2 min readLW link
(www.thefloatingdroid.com)

Char­ac­ter Train­ing In­duces Mo­ti­va­tion Clar­ifi­ca­tion: A Clue to Claude 3 Opus

Oliver Daniels25 Feb 2026 19:43 UTC
81 points
5 comments8 min readLW link

What se­cret goals does Claude think it has?

loops25 Feb 2026 19:22 UTC
93 points
11 comments4 min readLW link

Split­ting the Sun Equally

Commander Zander25 Feb 2026 18:49 UTC
8 points
1 comment3 min readLW link

Rea­son­ing Traces as a Path to Data-Effi­cient Gen­er­al­iza­tion in Data Poisoning

Joe Kwon25 Feb 2026 18:17 UTC
14 points
0 comments3 min readLW link

Train­ing Agents to Self-Re­port Misbehavior

25 Feb 2026 17:50 UTC
26 points
0 comments8 min readLW link

Why Amer­i­can Poli­tics is Differ­ent Now (for Richard Ngo)

Shiva's Right Foot25 Feb 2026 17:42 UTC
1 point
13 comments4 min readLW link

Beyond Moloch: The view from Evolu­tion­ary Game Theory

Jonah Wilberg25 Feb 2026 16:25 UTC
23 points
3 comments8 min readLW link

Uncer­tain Up­dates: Fe­bru­ary 2026

Gordon Seidoh Worley25 Feb 2026 16:10 UTC
9 points
2 comments1 min readLW link
(www.uncertainupdates.com)

Praise the Moloch!

Dentosal25 Feb 2026 12:15 UTC
−16 points
2 comments2 min readLW link

Against Epistemic Hu­mil­ity and for Epistemic Precision

25 Feb 2026 11:13 UTC
13 points
1 comment12 min readLW link
(cognition.cafe)

Re­view: The Cape Town Observatory

spookyuser25 Feb 2026 10:22 UTC
12 points
0 comments8 min readLW link

The Iron Kaleidoscope

edgecase6425 Feb 2026 6:24 UTC
2 points
0 comments2 min readLW link

Pro­saic Con­tinual Learning

HunterJay25 Feb 2026 6:11 UTC
39 points
15 comments7 min readLW link

Ru­mi­na­tion is a habit (and you can break it!)

Declan Molony25 Feb 2026 2:57 UTC
24 points
5 comments3 min readLW link

In-con­text learn­ing alone can in­duce weird generalisation

25 Feb 2026 2:46 UTC
68 points
3 comments8 min readLW link

On the phe­nomenolog­i­cal shift known as ‘stream en­try’ and its im­pli­ca­tions for consciousness

cube_flipper25 Feb 2026 1:30 UTC
40 points
6 comments25 min readLW link
(smoothbrains.net)

How to grow a nuke

RomanS25 Feb 2026 0:53 UTC
25 points
1 comment2 min readLW link

A sim­ple rule for causation

Vivek Hebbar24 Feb 2026 23:14 UTC
37 points
2 comments3 min readLW link

SWE-Bench Pro is even worse

Jonathan Gabor24 Feb 2026 22:51 UTC
24 points
0 comments1 min readLW link
(jonathanpgabor.substack.com)

We are all le­gal re­al­ists now

TFD24 Feb 2026 21:51 UTC
−12 points
1 comment4 min readLW link
(www.thefloatingdroid.com)

Re­spon­si­ble Scal­ing Policy v3

HoldenKarnofsky24 Feb 2026 20:20 UTC
179 points
82 comments36 min readLW link

[Question] What was the most effec­tive team you’ve ever been on, and what made it ex­cel­lent?

Eli Tyre24 Feb 2026 20:18 UTC
77 points
7 comments2 min readLW link

Why At­tack Suc­cess Rate Gives a False Pic­ture of Back­door Removal

Geoffrey Voyer24 Feb 2026 20:02 UTC
3 points
0 comments12 min readLW link

How I Started Be­ing Productive

atomic24 Feb 2026 19:49 UTC
8 points
0 comments10 min readLW link

Solv­ing The RAISE Act Like a (fic­tional) New York Detective

Josephine Schwab24 Feb 2026 19:35 UTC
3 points
1 comment6 min readLW link

Ex­clu­sive: Hegseth gives An­thropic un­til Fri­day to back down on AI safeguards

Matrice Jacobine24 Feb 2026 19:19 UTC
95 points
9 comments3 min readLW link
(www.axios.com)

Ci­garette Ads for Ba­bies from Microsoft Bing Image Generator

Edd Schneider24 Feb 2026 19:06 UTC
−4 points
1 comment4 min readLW link

Real­is­tic Eval­u­a­tions Will Not Prevent Eval­u­a­tion Awareness

Adam Karvonen24 Feb 2026 17:51 UTC
37 points
9 comments6 min readLW link

The Easiest Route to Se­cret Loy­alty May Be Hi­jack­ing the Model’s Chain of Command

Joe Kwon24 Feb 2026 17:47 UTC
16 points
1 comment5 min readLW link

Large-Scale On­line Deanonymiza­tion with LLMs

24 Feb 2026 17:02 UTC
69 points
5 comments4 min readLW link
(simonlermen.substack.com)

Open sourc­ing a browser ex­ten­sion that shows when peo­ple are wrong on the internet

lc24 Feb 2026 16:36 UTC
227 points
34 comments2 min readLW link
(github.com)

Ras­cal’s Wager

corticalcircuitry24 Feb 2026 16:13 UTC
3 points
2 comments3 min readLW link
(sergey.substack.com)

Citrini’s Sce­nario Is A Great But Deeply Flawed Thought Experiment

Zvi24 Feb 2026 15:40 UTC
37 points
6 comments22 min readLW link
(thezvi.wordpress.com)

Ob­ser­va­tions from Run­ning an Agent Collective

williawa24 Feb 2026 15:34 UTC
45 points
2 comments10 min readLW link

What is a species?

David Goodman24 Feb 2026 14:23 UTC
49 points
15 comments26 min readLW link

Mo­ral pub­lic goods are a big deal for whether we get a good future

24 Feb 2026 14:14 UTC
12 points
0 comments18 min readLW link
(www.forethought.org)

Two memos from 2024

Richard_Ngo24 Feb 2026 7:19 UTC
38 points
0 comments7 min readLW link

What is com­pu­ta­tional me­chan­ics? An ex­plainer

Leo Cymbalista24 Feb 2026 6:09 UTC
16 points
0 comments15 min readLW link

Mon­day AI Radar #14

Against Moloch24 Feb 2026 5:34 UTC
4 points
0 comments6 min readLW link
(againstmoloch.com)

The ML on­tol­ogy and the al­ign­ment ontology

Richard_Ngo24 Feb 2026 4:39 UTC
110 points
9 comments4 min readLW link

[USA To­day op-ed]: No, AI isn’t in­evitable. We should stop it while we can.

David Scott Krueger24 Feb 2026 2:05 UTC
17 points
0 comments1 min readLW link
(www.usatoday.com)

Bioan­chors 2: Elec­tric Bacilli

TsviBT24 Feb 2026 1:07 UTC
38 points
1 comment7 min readLW link

Sin­gle Stack LLMs are Split-Brain Pa­tients.

niceminus1924 Feb 2026 0:04 UTC
5 points
0 comments3 min readLW link

Us­ing fic­tion to imag­ine a path­way to friendlyAGI

Rick Moss23 Feb 2026 23:48 UTC
3 points
0 comments2 min readLW link

When Bench­marks Lie: Eval­u­at­ing Mal­i­cious Prompt Clas­sifiers Un­der True Distri­bu­tion Shift

Max Fomin23 Feb 2026 23:44 UTC
1 point
2 comments6 min readLW link

The per­sona se­lec­tion model

Sam Marks23 Feb 2026 22:56 UTC
176 points
53 comments43 min readLW link
(alignment.anthropic.com)