All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All JanFebMar Apr May Jun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 171819 20 21 22 23 24 25 26 27 28

Deception Channeling: Training Models to Always Verbalize Alignment Faking

Florian_Dietz17 Feb 2026 22:28 UTC

7 points

2 comments9 min readLW link

Rephrasing Reduces Eval Awareness...

atharva17 Feb 2026 22:23 UTC

23 points

4 comments3 min readLW link

The Math And The Territory

cylonator17 Feb 2026 21:53 UTC

2 points

0 comments8 min readLW link

Words are not dead

William tirkey17 Feb 2026 21:42 UTC

−2 points

2 comments5 min readLW link

Review of the System Theory as a Field of Knowledge

siarshai17 Feb 2026 21:34 UTC

4 points

1 comment18 min readLW link

You’re an AI Expert – Not an Influencer

Max Winga17 Feb 2026 21:03 UTC

180 points

25 comments6 min readLW link

(maxwinga.substack.com)

“We are confused about agency”

Cole Wyeth17 Feb 2026 19:51 UTC

57 points

37 comments3 min readLW link

Maybe benchmarks should be broken?

Jonathan Gabor17 Feb 2026 19:49 UTC

24 points

2 comments1 min readLW link

(jonathanpgabor.substack.com)

The brain is a machine that runs an algorithm

Steven Byrnes17 Feb 2026 19:36 UTC

114 points

18 comments4 min readLW link

TV Detector Vans

J Bostock17 Feb 2026 18:29 UTC

57 points

10 comments2 min readLW link

Notes on International Klein Blue

jenn17 Feb 2026 17:51 UTC

46 points

0 comments5 min readLW link

(www.jenn.site)

How to fail anything: a complete guide

Crazy philosopher17 Feb 2026 17:44 UTC

1 point

0 comments4 min readLW link

Superintelligence Alignment Seminar (1 month focused upskilling)

Mateusz Bagiński17 Feb 2026 17:03 UTC

115 points

13 comments3 min readLW link

The Multi-Agent Minefield: Can LLMs Cooperate to Avoid Global Catastrophe?

Zhijing Jin, phamt, TerryJCZhang, pepijn_cobben, Angelo Huang, Isabel Dahlgren and Jacob Brinton

17 Feb 2026 16:55 UTC

14 points

2 comments5 min readLW link

Persuading Trump of a proper US-China-led AI Treaty

rguerreschi17 Feb 2026 16:37 UTC

9 points

8 comments6 min readLW link

AI Safety via Generalization and Caution: A Research Agenda

Benjamin Plaut17 Feb 2026 16:01 UTC

1 point

0 comments14 min readLW link

On Dwarkesh Patel’s 2026 Podcast With Elon Musk and Other Recent Elon Musk Things

Zvi17 Feb 2026 15:30 UTC

56 points

2 comments26 min readLW link

(thezvi.wordpress.com)

We need a hardware moratorium now

KanHar17 Feb 2026 13:23 UTC

11 points

3 comments9 min readLW link

NEST: Nascent Encoded Steganographic Thoughts

Artem Karpov17 Feb 2026 7:55 UTC

20 points

8 comments13 min readLW link

[Question] Why did you buy Bitcoin?

NoSignalNoNoise17 Feb 2026 5:20 UTC

11 points

1 comment1 min readLW link

Gyre

vgel17 Feb 2026 0:38 UTC

260 points

24 comments8 min readLW link

(vgel.me)

Words Are A Leaky Abstraction

sonicrocketman16 Feb 2026 22:20 UTC

1 point

0 comments5 min readLW link

(brianschrader.com)

Correlation Does in Fact Imply Causation

KaseyMarkel16 Feb 2026 21:17 UTC

5 points

15 comments3 min readLW link

Sealed Predictions—A Solution.

george_is_thinking16 Feb 2026 20:59 UTC

11 points

2 comments5 min readLW link

Memory Decoding Journal Club: The Songbird as a Model for the Generation and Learning of Complex Sequential Behaviors

Devin Ward16 Feb 2026 20:46 UTC

2 points

0 comments1 min readLW link

Contra Caplan on higher education

Richard_Ngo16 Feb 2026 20:43 UTC

55 points

15 comments7 min readLW link

(www.mindthefuture.info)

Will reward-seekers respond to distant incentives?

Alex Mallen16 Feb 2026 19:35 UTC

57 points

4 comments10 min readLW link

[Question] What’s Your P(WEIRD)?

RogerDearnaley16 Feb 2026 18:19 UTC

27 points

18 comments9 min readLW link

Estimating METR Time Horizons for Claude Opus 4.6 and GPT 5.3 Codex (xhigh)

CharlesD16 Feb 2026 18:14 UTC

33 points

6 comments3 min readLW link

Charlatan Labyrinth

niplav16 Feb 2026 17:56 UTC

16 points

8 comments1 min readLW link

Jailbreaking is Empirical Evidence for Inner Misalignment and Against Alignment by Default

Jérémy Andréoletti16 Feb 2026 17:49 UTC

51 points

16 comments2 min readLW link

Break Stasis

Oldmanrahul16 Feb 2026 17:33 UTC

2 points

0 comments2 min readLW link

(oldmanrahul.com)

LLM Self-Expression Through Music Videos

Josh Snider16 Feb 2026 17:09 UTC

14 points

0 comments7 min readLW link

Towards A Happy Future With AI Employers

Lukas Petersson16 Feb 2026 17:00 UTC

12 points

0 comments1 min readLW link

(andonlabs.com)

Persona Parasitology

Raymond Douglas16 Feb 2026 16:22 UTC

177 points

38 comments11 min readLW link

On Dwarkesh Patel’s 2026 Podcast With Dario Amodei

Zvi16 Feb 2026 14:30 UTC

42 points

0 comments16 min readLW link

(thezvi.wordpress.com)

WeirdML Time Horizons

Håvard Tveit Ihle16 Feb 2026 10:25 UTC

90 points

2 comments11 min readLW link

Text Posts from the Kids Group: 2025

jefftk16 Feb 2026 10:00 UTC

15 points

1 comment14 min readLW link

(www.jefftk.com)

building sqlite with a small swarm

kian16 Feb 2026 5:33 UTC

7 points

4 comments1 min readLW link

(kiankyars.github.io)

My experience of the 2025 CFAR Workshop

Cookie penguin16 Feb 2026 3:33 UTC

83 points

4 comments4 min readLW link

Cultivating Gardens

jenn and jas.

16 Feb 2026 1:40 UTC

28 points

1 comment22 min readLW link

The World Keeps Getting Saved and You Don’t Notice

Bogoed16 Feb 2026 1:01 UTC

210 points

20 comments2 min readLW link

Most Observers Are Alone: The Fermi Paradox as Default

SE Gyges16 Feb 2026 0:52 UTC

29 points

12 comments4 min readLW link

(segyges.leaflet.pub)

Aligning to Virtues

Richard_Ngo16 Feb 2026 0:37 UTC

93 points

36 comments4 min readLW link

Attach Yourself to the Right Person, and You’ll Go Far (a nerdy poem about bugs)

Character#273615 Feb 2026 23:36 UTC

10 points

0 comments1 min readLW link

Model multitasking: Can a model learn two different tasks simultaneously through Grokking?

arcee18315 Feb 2026 23:06 UTC

7 points

0 comments9 min readLW link

Phantom Transfer and the Basic Science of Data Poisoning

draganover, Tolga H. Dur, Andi Bhongade and Mary Phuong

15 Feb 2026 19:51 UTC

82 points

8 comments6 min readLW link

Should anyone’s “analysis” of extremely complex systems, such as geopolitics, be taken seriously? or, Does anyone take a 5 year old’s “analysis” of decently complex systems, like big city politics, seriously?

M. Y. Zuo15 Feb 2026 18:44 UTC

18 points

5 comments1 min readLW link

Painless Activation Steering

Sasha Cui15 Feb 2026 17:49 UTC

14 points

2 comments1 min readLW link

(open.substack.com)

PieArena: Language Agents Negotiating Against Yale MBAs

Sasha Cui15 Feb 2026 17:45 UTC

5 points

0 comments1 min readLW link

(open.substack.com)