All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All JanFebMar Apr May Jun

All 1 2 3 4 5 6 7 8 91011 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Introspective Interpretability: a Definition, Motivation, and Open Problems

Belinda Li9 Feb 2026 23:53 UTC

10 points

0 comments13 min readLW link

Job Listing (Closed): CBAI Operations Associate

Maite Abadia-Manthei and emreyavuz

9 Feb 2026 23:36 UTC

1 point

0 comments1 min readLW link

Weight-Sparse Circuits May Be Interpretable Yet Unfaithful

jacob_drori9 Feb 2026 23:25 UTC

136 points

5 comments8 min readLW link

Gwern’s 2025 Inkhaven Writing Interview

gwern9 Feb 2026 22:11 UTC

49 points

2 comments31 min readLW link

(gwern.net)

Claude Opus 4.6: System Card Part 1: Mundane Alignment and Model Welfare

Zvi9 Feb 2026 21:30 UTC

36 points

5 comments26 min readLW link

(thezvi.wordpress.com)

Closure

Vadim Golub9 Feb 2026 21:17 UTC

3 points

0 comments2 min readLW link

Aurelius: Proposing Alignment as an Emergent Property

Austin McCaffrey9 Feb 2026 20:13 UTC

−5 points

0 comments1 min readLW link

(github.com)

Distributed vs centralized agents

Richard_Ngo9 Feb 2026 20:06 UTC

51 points

9 comments1 min readLW link

Stone Age Billionaire Can’t Words Good

Eneasz9 Feb 2026 18:51 UTC

169 points

95 comments12 min readLW link

(deathisbad.substack.com)

Do Models Continue Misaligned Actions? [eval]

Jordan Taylor9 Feb 2026 16:59 UTC

76 points

12 comments11 min readLW link

the extraordinary as mundane

Derek DeHart9 Feb 2026 16:26 UTC

3 points

2 comments5 min readLW link

(dehart.substack.com)

Large Language Models Live in Time

Eleni Angelou9 Feb 2026 15:08 UTC

20 points

2 comments4 min readLW link

Sympathy for the Model, or, Welfare Concerns as Takeover Risk

J Bostock9 Feb 2026 14:19 UTC

42 points

37 comments3 min readLW link

Opus 4.6 Reasoning Doesn’t Verbalize Alignment Faking, but Behavior Persists

Daan Henselmans, Arno Libert and LennardZ

9 Feb 2026 12:55 UTC

118 points

13 comments8 min readLW link

Does an AI Society Need an Immune System? Accepting Yampolskiy’s Impossibility Results

Hiroshi Yamakawa9 Feb 2026 12:32 UTC

13 points

0 comments10 min readLW link

Can Hardware Save Us from Software?

Alvin Ånestrand9 Feb 2026 11:57 UTC

23 points

2 comments12 min readLW link

(forecastingaifutures.substack.com)

Complexity Science as Bridge to Eastern Philosophy

pchvykov9 Feb 2026 10:40 UTC

1 point

2 comments2 min readLW link

Design sketches for a more sensible world

owencb, Lizka, Oliver Sourbut and rosehadshar

9 Feb 2026 10:22 UTC

26 points

2 comments4 min readLW link

(www.forethought.org)

Design sketches for angels-on-the-shoulder

owencb, Lizka, Oliver Sourbut and rosehadshar

9 Feb 2026 9:52 UTC

23 points

0 comments2 min readLW link

(www.forethought.org)

Model Integrity and Character

Oliver Klingefjord9 Feb 2026 8:15 UTC

12 points

3 comments6 min readLW link

Eleven Practical Ways to Prepare for AGI

John-Clark Levin9 Feb 2026 7:57 UTC

24 points

16 comments5 min readLW link

The difference in risk/reward for Humanity as a super-organism vs. as a collection of individuals

ZhanRocks9 Feb 2026 7:53 UTC

1 point

0 comments1 min readLW link

Answer in your head

throwaway8355439 Feb 2026 7:41 UTC

16 points

2 comments3 min readLW link

Evaluating Conflict of Interest

warner9 Feb 2026 7:30 UTC

1 point

0 comments2 min readLW link

Three visions for diffuse control

Alek Westover9 Feb 2026 6:41 UTC

8 points

0 comments3 min readLW link

Observations and Complexity

Ape in the coat9 Feb 2026 6:13 UTC

9 points

2 comments3 min readLW link

(apeinthecoat102771.substack.com)

A Perfect Resurrection

MarkelKori9 Feb 2026 1:33 UTC

9 points

16 comments3 min readLW link

Empathy Has Outworn Its Place in Politics

Character#27368 Feb 2026 23:22 UTC

−26 points

8 comments4 min readLW link

The Two-Board Problem: Training Environment for Research Agents

Valerii K.8 Feb 2026 23:13 UTC

4 points

0 comments9 min readLW link

Join My New Movement for the Post-AI World

E.G. Blee-Goldman8 Feb 2026 22:18 UTC

0 points

0 comments7 min readLW link

Donations, The Fifth Year

jenn8 Feb 2026 22:04 UTC

39 points

0 comments4 min readLW link

(www.jenn.site)

Every Measurement Has a Scale

CarolusRenniusVitellius8 Feb 2026 20:07 UTC

17 points

4 comments4 min readLW link

(charlesr-w.github.io)

UtopiaBench

nielsrolf8 Feb 2026 18:19 UTC

67 points

10 comments1 min readLW link

Smokey, This is not ’Nam Or: [Already] over the [red] line!

Davidmanheim8 Feb 2026 12:24 UTC

110 points

22 comments4 min readLW link

The optimal age to freeze eggs is 19

GeneSmith8 Feb 2026 9:44 UTC

195 points

48 comments6 min readLW link

It Is Reasonable To Research How To Use Model Internals In Training

Neel Nanda8 Feb 2026 3:44 UTC

103 points

15 comments4 min readLW link

Claude’s Bad Primer Fanfic

abramdemski8 Feb 2026 0:39 UTC

24 points

12 comments54 min readLW link

Can thoughtcrimes scare a cautious satisficer?

Knight Lee7 Feb 2026 23:28 UTC

4 points

4 comments1 min readLW link

[Question] What should I try to do this year?

abstractapplic7 Feb 2026 22:06 UTC

36 points

4 comments1 min readLW link

Does focusing on animal welfare make sense if you’re AI-pilled?

GradientDissenter7 Feb 2026 20:51 UTC

13 points

7 comments8 min readLW link

Near-Instantly Aborting the Worst Pain Imaginable with Psychedelics

eleweek7 Feb 2026 16:11 UTC

217 points

13 comments13 min readLW link

(psychotechnology.substack.com)

Why yeast-based vaccines could be a big deal for biosecurity

delton1377 Feb 2026 16:08 UTC

62 points

8 comments11 min readLW link

Prompt injection in Google Translate reveals base model behaviors behind task-specific fine-tuning

megasilverfist7 Feb 2026 13:56 UTC

160 points

27 comments3 min readLW link

Eunification: a Historical Perspective

Martin Sustrik7 Feb 2026 13:31 UTC

19 points

5 comments5 min readLW link

(www.250bpm.com)

Voting Results for the 2024 Review

RobertM7 Feb 2026 3:48 UTC

98 points

0 comments1 min readLW link

Playing with an Infrared Camera

jefftk7 Feb 2026 3:30 UTC

33 points

1 comment1 min readLW link

(www.jefftk.com)

Honey, I shrunk the brain

Andy_McKenzie7 Feb 2026 0:01 UTC

128 points

1 comment5 min readLW link

(neurobiology.substack.com)

Strategy of von Neumann and strategy of Rosenbergs

avturchin6 Feb 2026 22:50 UTC

5 points

4 comments2 min readLW link

Data-Centric Interpretability for LLM-based Multi-Agent Reinforcement Learning

michaelwaves, Yanjo and Yuqi Sun

6 Feb 2026 19:27 UTC

10 points

0 comments4 min readLW link

Parks Aren’t Nature

Sable6 Feb 2026 18:27 UTC

50 points

11 comments8 min readLW link

(affablyevil.substack.com)