All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All JanFebMar Apr May Jun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 252627 28

Schmidt Sciences’ request for proposals on the Science of Trustworthy AI

James Fox25 Feb 2026 21:42 UTC

31 points

0 comments12 min readLW link

(schmidtsciences.smapply.io)

Naloe: A True Program Editor

TristanTrim25 Feb 2026 21:08 UTC

8 points

4 comments3 min readLW link

Anthropic and the Department of War

Zvi25 Feb 2026 21:00 UTC

89 points

10 comments33 min readLW link

(thezvi.wordpress.com)

Does the First Amendment protect Anthropic from Hegseth?

TFD25 Feb 2026 21:00 UTC

10 points

0 comments2 min readLW link

(www.thefloatingdroid.com)

Character Training Induces Motivation Clarification: A Clue to Claude 3 Opus

Oliver Daniels25 Feb 2026 19:43 UTC

81 points

5 comments8 min readLW link

What secret goals does Claude think it has?

loops25 Feb 2026 19:22 UTC

93 points

11 comments4 min readLW link

Splitting the Sun Equally

Commander Zander25 Feb 2026 18:49 UTC

8 points

1 comment3 min readLW link

Reasoning Traces as a Path to Data-Efficient Generalization in Data Poisoning

Joe Kwon25 Feb 2026 18:17 UTC

14 points

0 comments3 min readLW link

Training Agents to Self-Report Misbehavior

Bruce W. Lee, Yueh Han "John" Chen and Tomek Korbak

25 Feb 2026 17:50 UTC

26 points

0 comments8 min readLW link

Why American Politics is Different Now (for Richard Ngo)

Shiva's Right Foot25 Feb 2026 17:42 UTC

1 point

13 comments4 min readLW link

Beyond Moloch: The view from Evolutionary Game Theory

Jonah Wilberg25 Feb 2026 16:25 UTC

23 points

3 comments8 min readLW link

Uncertain Updates: February 2026

Gordon Seidoh Worley25 Feb 2026 16:10 UTC

9 points

2 comments1 min readLW link

(www.uncertainupdates.com)

Praise the Moloch!

Dentosal25 Feb 2026 12:15 UTC

−16 points

2 comments2 min readLW link

Against Epistemic Humility and for Epistemic Precision

PranavG and Gabriel Alfour

25 Feb 2026 11:13 UTC

13 points

1 comment12 min readLW link

(cognition.cafe)

Review: The Cape Town Observatory

spookyuser25 Feb 2026 10:22 UTC

12 points

0 comments8 min readLW link

The Iron Kaleidoscope

edgecase6425 Feb 2026 6:24 UTC

2 points

0 comments2 min readLW link

Prosaic Continual Learning

HunterJay25 Feb 2026 6:11 UTC

39 points

15 comments7 min readLW link

Rumination is a habit (and you can break it!)

Declan Molony25 Feb 2026 2:57 UTC

24 points

5 comments3 min readLW link

In-context learning alone can induce weird generalisation

Cozmin Ududec, Benji Berczi and Kyuhee Kim

25 Feb 2026 2:46 UTC

68 points

3 comments8 min readLW link

On the phenomenological shift known as ‘stream entry’ and its implications for consciousness

cube_flipper25 Feb 2026 1:30 UTC

40 points

6 comments25 min readLW link

(smoothbrains.net)

How to grow a nuke

RomanS25 Feb 2026 0:53 UTC

25 points

1 comment2 min readLW link

A simple rule for causation

Vivek Hebbar24 Feb 2026 23:14 UTC

37 points

2 comments3 min readLW link

SWE-Bench Pro is even worse

Jonathan Gabor24 Feb 2026 22:51 UTC

24 points

0 comments1 min readLW link

(jonathanpgabor.substack.com)

We are all legal realists now

TFD24 Feb 2026 21:51 UTC

−12 points

1 comment4 min readLW link

(www.thefloatingdroid.com)

Responsible Scaling Policy v3

HoldenKarnofsky24 Feb 2026 20:20 UTC

179 points

82 comments36 min readLW link

[Question] What was the most effective team you’ve ever been on, and what made it excellent?

Eli Tyre24 Feb 2026 20:18 UTC

77 points

7 comments2 min readLW link

Why Attack Success Rate Gives a False Picture of Backdoor Removal

Geoffrey Voyer24 Feb 2026 20:02 UTC

3 points

0 comments12 min readLW link

How I Started Being Productive

atomic24 Feb 2026 19:49 UTC

8 points

0 comments10 min readLW link

Solving The RAISE Act Like a (fictional) New York Detective

Josephine Schwab24 Feb 2026 19:35 UTC

3 points

1 comment6 min readLW link

Exclusive: Hegseth gives Anthropic until Friday to back down on AI safeguards

Matrice Jacobine24 Feb 2026 19:19 UTC

95 points

9 comments3 min readLW link

(www.axios.com)

Cigarette Ads for Babies from Microsoft Bing Image Generator

Edd Schneider24 Feb 2026 19:06 UTC

−4 points

1 comment4 min readLW link

Realistic Evaluations Will Not Prevent Evaluation Awareness

Adam Karvonen24 Feb 2026 17:51 UTC

37 points

9 comments6 min readLW link

The Easiest Route to Secret Loyalty May Be Hijacking the Model’s Chain of Command

Joe Kwon24 Feb 2026 17:47 UTC

16 points

1 comment5 min readLW link

Large-Scale Online Deanonymization with LLMs

Simon Lermen and Daniel Paleka

24 Feb 2026 17:02 UTC

69 points

5 comments4 min readLW link

(simonlermen.substack.com)

Open sourcing a browser extension that shows when people are wrong on the internet

lc24 Feb 2026 16:36 UTC

227 points

34 comments2 min readLW link

(github.com)

Rascal’s Wager

corticalcircuitry24 Feb 2026 16:13 UTC

3 points

2 comments3 min readLW link

(sergey.substack.com)

Citrini’s Scenario Is A Great But Deeply Flawed Thought Experiment

Zvi24 Feb 2026 15:40 UTC

37 points

6 comments22 min readLW link

(thezvi.wordpress.com)

Observations from Running an Agent Collective

williawa24 Feb 2026 15:34 UTC

45 points

2 comments10 min readLW link

What is a species?

David Goodman24 Feb 2026 14:23 UTC

49 points

15 comments26 min readLW link

Moral public goods are a big deal for whether we get a good future

Mia Taylor, Tom Davidson and wdmacaskill

24 Feb 2026 14:14 UTC

12 points

0 comments18 min readLW link

(www.forethought.org)

Two memos from 2024

Richard_Ngo24 Feb 2026 7:19 UTC

38 points

0 comments7 min readLW link

What is computational mechanics? An explainer

Leo Cymbalista24 Feb 2026 6:09 UTC

16 points

0 comments15 min readLW link

Monday AI Radar #14

Against Moloch24 Feb 2026 5:34 UTC

4 points

0 comments6 min readLW link

(againstmoloch.com)

The ML ontology and the alignment ontology

Richard_Ngo24 Feb 2026 4:39 UTC

110 points

9 comments4 min readLW link

[USA Today op-ed]: No, AI isn’t inevitable. We should stop it while we can.

David Scott Krueger24 Feb 2026 2:05 UTC

17 points

0 comments1 min readLW link

(www.usatoday.com)

Bioanchors 2: Electric Bacilli

TsviBT24 Feb 2026 1:07 UTC

38 points

1 comment7 min readLW link

Single Stack LLMs are Split-Brain Patients.

niceminus1924 Feb 2026 0:04 UTC

5 points

0 comments3 min readLW link

Using fiction to imagine a pathway to friendlyAGI

Rick Moss23 Feb 2026 23:48 UTC

3 points

0 comments2 min readLW link

When Benchmarks Lie: Evaluating Malicious Prompt Classifiers Under True Distribution Shift

Max Fomin23 Feb 2026 23:44 UTC

1 point

2 comments6 min readLW link

The persona selection model

Sam Marks23 Feb 2026 22:56 UTC

176 points

53 comments43 min readLW link

(alignment.anthropic.com)