All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All JanFebMar Apr May Jun

All 1 2 3 4 5 6 7 8 9 101112 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

My journey to the microwave alternate timeline

Malmesbury10 Feb 2026 17:59 UTC

782 points

58 comments10 min readLW link

Stress-Testing Alignment Audits With Prompt-Level Strategic Deception

Oliver Daniels, Perusha Moodley and David Lindner

10 Feb 2026 17:29 UTC

16 points

0 comments1 min readLW link

(arxiv.org)

Heuristics for lab robotics, and where its future may go

Abhishaike Mahajan10 Feb 2026 17:13 UTC

79 points

4 comments28 min readLW link

(www.owlposting.com)

On Meta-Level Adversarial Evaluations of (White-Box) Alignment Auditing

Oliver Daniels10 Feb 2026 17:06 UTC

27 points

5 comments3 min readLW link

LLMs Views on Philosophy 2026

JonathanErhardt10 Feb 2026 16:12 UTC

35 points

3 comments1 min readLW link

Claude Opus 4.6: System Card Part 2: Frontier Alignment

Zvi10 Feb 2026 16:10 UTC

46 points

0 comments18 min readLW link

(thezvi.wordpress.com)

Coping with Deconversion

Benjamin Hendricks10 Feb 2026 13:26 UTC

21 points

22 comments1 min readLW link

“Recursive Self-Improvement” Is Three Different Things

Ihor Kendiukhov10 Feb 2026 12:49 UTC

25 points

6 comments2 min readLW link

SAE Feature Matchmaking (Layer-to-Layer)

Mitali M10 Feb 2026 4:32 UTC

9 points

0 comments1 min readLW link

Monday AI Radar #12

Against Moloch10 Feb 2026 4:28 UTC

16 points

1 comment7 min readLW link

(againstmoloch.com)

Ending Parking Space Saving

jefftk10 Feb 2026 2:30 UTC

26 points

4 comments2 min readLW link

(www.jefftk.com)

[Question] Should we consider Meta to be a criminal enterprise?

ChristianKl10 Feb 2026 2:10 UTC

43 points

23 comments1 min readLW link

[Question] OK, what’s the difference between coherence and representation theorems?

Algon10 Feb 2026 0:45 UTC

15 points

7 comments2 min readLW link

Introspective Interpretability: a Definition, Motivation, and Open Problems

Belinda Li9 Feb 2026 23:53 UTC

10 points

0 comments13 min readLW link

Job Listing (Closed): CBAI Operations Associate

Maite Abadia-Manthei and emreyavuz

9 Feb 2026 23:36 UTC

1 point

0 comments1 min readLW link

Weight-Sparse Circuits May Be Interpretable Yet Unfaithful

jacob_drori9 Feb 2026 23:25 UTC

136 points

5 comments8 min readLW link

Gwern’s 2025 Inkhaven Writing Interview

gwern9 Feb 2026 22:11 UTC

49 points

2 comments31 min readLW link

(gwern.net)

Claude Opus 4.6: System Card Part 1: Mundane Alignment and Model Welfare

Zvi9 Feb 2026 21:30 UTC

36 points

5 comments26 min readLW link

(thezvi.wordpress.com)

Closure

Vadim Golub9 Feb 2026 21:17 UTC

3 points

0 comments2 min readLW link

Aurelius: Proposing Alignment as an Emergent Property

Austin McCaffrey9 Feb 2026 20:13 UTC

−5 points

0 comments1 min readLW link

(github.com)

Distributed vs centralized agents

Richard_Ngo9 Feb 2026 20:06 UTC

51 points

9 comments1 min readLW link

Stone Age Billionaire Can’t Words Good

Eneasz9 Feb 2026 18:51 UTC

169 points

95 comments12 min readLW link

(deathisbad.substack.com)

Do Models Continue Misaligned Actions? [eval]

Jordan Taylor9 Feb 2026 16:59 UTC

76 points

12 comments11 min readLW link

the extraordinary as mundane

Derek DeHart9 Feb 2026 16:26 UTC

3 points

2 comments5 min readLW link

(dehart.substack.com)

Large Language Models Live in Time

Eleni Angelou9 Feb 2026 15:08 UTC

20 points

2 comments4 min readLW link

Sympathy for the Model, or, Welfare Concerns as Takeover Risk

J Bostock9 Feb 2026 14:19 UTC

42 points

37 comments3 min readLW link

Opus 4.6 Reasoning Doesn’t Verbalize Alignment Faking, but Behavior Persists

Daan Henselmans, Arno Libert and LennardZ

9 Feb 2026 12:55 UTC

118 points

13 comments8 min readLW link

Does an AI Society Need an Immune System? Accepting Yampolskiy’s Impossibility Results

Hiroshi Yamakawa9 Feb 2026 12:32 UTC

13 points

0 comments10 min readLW link

Can Hardware Save Us from Software?

Alvin Ånestrand9 Feb 2026 11:57 UTC

23 points

2 comments12 min readLW link

(forecastingaifutures.substack.com)

Complexity Science as Bridge to Eastern Philosophy

pchvykov9 Feb 2026 10:40 UTC

1 point

2 comments2 min readLW link

Design sketches for a more sensible world

owencb, Lizka, Oliver Sourbut and rosehadshar

9 Feb 2026 10:22 UTC

26 points

2 comments4 min readLW link

(www.forethought.org)

Design sketches for angels-on-the-shoulder

owencb, Lizka, Oliver Sourbut and rosehadshar

9 Feb 2026 9:52 UTC

23 points

0 comments2 min readLW link

(www.forethought.org)

Model Integrity and Character

Oliver Klingefjord9 Feb 2026 8:15 UTC

12 points

3 comments6 min readLW link

Eleven Practical Ways to Prepare for AGI

John-Clark Levin9 Feb 2026 7:57 UTC

24 points

16 comments5 min readLW link

The difference in risk/reward for Humanity as a super-organism vs. as a collection of individuals

ZhanRocks9 Feb 2026 7:53 UTC

1 point

0 comments1 min readLW link

Answer in your head

throwaway8355439 Feb 2026 7:41 UTC

16 points

2 comments3 min readLW link

Evaluating Conflict of Interest

warner9 Feb 2026 7:30 UTC

1 point

0 comments2 min readLW link

Three visions for diffuse control

Alek Westover9 Feb 2026 6:41 UTC

8 points

0 comments3 min readLW link

Observations and Complexity

Ape in the coat9 Feb 2026 6:13 UTC

9 points

2 comments3 min readLW link

(apeinthecoat102771.substack.com)

A Perfect Resurrection

MarkelKori9 Feb 2026 1:33 UTC

9 points

16 comments3 min readLW link

Empathy Has Outworn Its Place in Politics

Character#27368 Feb 2026 23:22 UTC

−26 points

8 comments4 min readLW link

The Two-Board Problem: Training Environment for Research Agents

Valerii K.8 Feb 2026 23:13 UTC

4 points

0 comments9 min readLW link

Join My New Movement for the Post-AI World

E.G. Blee-Goldman8 Feb 2026 22:18 UTC

0 points

0 comments7 min readLW link

Donations, The Fifth Year

jenn8 Feb 2026 22:04 UTC

39 points

0 comments4 min readLW link

(www.jenn.site)

Every Measurement Has a Scale

CarolusRenniusVitellius8 Feb 2026 20:07 UTC

17 points

4 comments4 min readLW link

(charlesr-w.github.io)

UtopiaBench

nielsrolf8 Feb 2026 18:19 UTC

67 points

10 comments1 min readLW link

Smokey, This is not ’Nam Or: [Already] over the [red] line!

Davidmanheim8 Feb 2026 12:24 UTC

110 points

22 comments4 min readLW link

The optimal age to freeze eggs is 19

GeneSmith8 Feb 2026 9:44 UTC

195 points

48 comments6 min readLW link

It Is Reasonable To Research How To Use Model Internals In Training

Neel Nanda8 Feb 2026 3:44 UTC

103 points

15 comments4 min readLW link

Claude’s Bad Primer Fanfic

abramdemski8 Feb 2026 0:39 UTC

24 points

12 comments54 min readLW link