All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All JanFebMar Apr May Jun

All 1 2 3 4 5 6 7 8 9 10 111213 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Timeless Engineering

Jack Bradshaw11 Feb 2026 23:53 UTC

−14 points

0 comments5 min readLW link

[Paper] How does information access affect LLM monitors’ ability to detect sabotage?

Rauno Arike, Raja Moreno, RohanS, Shubhorup Biswas and Francis Rhys Ward

11 Feb 2026 21:25 UTC

26 points

0 comments6 min readLW link

Claude Opus 4.6 Escalates Things Quickly

Zvi11 Feb 2026 21:20 UTC

51 points

0 comments34 min readLW link

(thezvi.wordpress.com)

Where Will Call Center Workers Go?

loic11 Feb 2026 20:44 UTC

19 points

2 comments4 min readLW link

Distinguish between inference scaling and “larger tasks use more compute”

ryan_greenblatt11 Feb 2026 18:37 UTC

87 points

5 comments2 min readLW link

Monitor Jailbreaking: Evading Chain-of-Thought Monitoring Without Encoded Reasoning

Wuschel Schulz11 Feb 2026 17:18 UTC

61 points

17 comments5 min readLW link

[Hiring] Principia Research Fellows

Matthias Dellago and Jin Hwa Lee

11 Feb 2026 16:30 UTC

35 points

1 comment3 min readLW link

The SaaS Bloodbath: the Opportunities and Perils for Investors

ykevinzhang11 Feb 2026 16:17 UTC

0 points

0 comments4 min readLW link

On Resolving the Great Matter

Gordon Seidoh Worley11 Feb 2026 15:30 UTC

11 points

7 comments3 min readLW link

(www.uncertainupdates.com)

Is a constitution a “noble lie”?

SpectrumDT11 Feb 2026 15:08 UTC

4 points

10 comments2 min readLW link

Jevons Burnout

Kemp11 Feb 2026 13:29 UTC

−3 points

1 comment1 min readLW link

Strategic awareness tools: design sketches

rosehadshar, owencb, Lizka and Oliver Sourbut

11 Feb 2026 12:28 UTC

18 points

2 comments1 min readLW link

(www.forethought.org)

Introspective RSI vs Extrospective RSI

Cleo Nardo11 Feb 2026 11:54 UTC

10 points

6 comments2 min readLW link

[Question] What concrete mechanisms could lead to AI models having open-ended goals?

Jemal Young11 Feb 2026 9:08 UTC

10 points

4 comments1 min readLW link

Is Everything Connected? A McLuhan Thought Experiment

R0sberg11 Feb 2026 6:04 UTC

2 points

0 comments6 min readLW link

Designing Prediction Markets

ToasterLightning11 Feb 2026 5:38 UTC

58 points

6 comments7 min readLW link

punctilio: the best text prettifier

TurnTrout11 Feb 2026 4:49 UTC

24 points

0 comments5 min readLW link

(github.com)

LessOnline 2026: June 5-7, Berkeley, CA (save the date)

Ruby11 Feb 2026 0:15 UTC

56 points

7 comments1 min readLW link

(Less.Online)

Building a Regex Engine with a team of parallel Claudes

kian11 Feb 2026 0:08 UTC

2 points

2 comments1 min readLW link

(kiankyars.github.io)

My journey to the microwave alternate timeline

Malmesbury10 Feb 2026 17:59 UTC

782 points

58 comments10 min readLW link

Stress-Testing Alignment Audits With Prompt-Level Strategic Deception

Oliver Daniels, Perusha Moodley and David Lindner

10 Feb 2026 17:29 UTC

16 points

0 comments1 min readLW link

(arxiv.org)

Heuristics for lab robotics, and where its future may go

Abhishaike Mahajan10 Feb 2026 17:13 UTC

79 points

4 comments28 min readLW link

(www.owlposting.com)

On Meta-Level Adversarial Evaluations of (White-Box) Alignment Auditing

Oliver Daniels10 Feb 2026 17:06 UTC

27 points

5 comments3 min readLW link

LLMs Views on Philosophy 2026

JonathanErhardt10 Feb 2026 16:12 UTC

35 points

3 comments1 min readLW link

Claude Opus 4.6: System Card Part 2: Frontier Alignment

Zvi10 Feb 2026 16:10 UTC

46 points

0 comments18 min readLW link

(thezvi.wordpress.com)

Coping with Deconversion

Benjamin Hendricks10 Feb 2026 13:26 UTC

21 points

22 comments1 min readLW link

“Recursive Self-Improvement” Is Three Different Things

Ihor Kendiukhov10 Feb 2026 12:49 UTC

25 points

6 comments2 min readLW link

SAE Feature Matchmaking (Layer-to-Layer)

Mitali M10 Feb 2026 4:32 UTC

9 points

0 comments1 min readLW link

Monday AI Radar #12

Against Moloch10 Feb 2026 4:28 UTC

16 points

1 comment7 min readLW link

(againstmoloch.com)

Ending Parking Space Saving

jefftk10 Feb 2026 2:30 UTC

26 points

4 comments2 min readLW link

(www.jefftk.com)

[Question] Should we consider Meta to be a criminal enterprise?

ChristianKl10 Feb 2026 2:10 UTC

43 points

23 comments1 min readLW link

[Question] OK, what’s the difference between coherence and representation theorems?

Algon10 Feb 2026 0:45 UTC

15 points

7 comments2 min readLW link

Introspective Interpretability: a Definition, Motivation, and Open Problems

Belinda Li9 Feb 2026 23:53 UTC

10 points

0 comments13 min readLW link

Job Listing (Closed): CBAI Operations Associate

Maite Abadia-Manthei and emreyavuz

9 Feb 2026 23:36 UTC

1 point

0 comments1 min readLW link

Weight-Sparse Circuits May Be Interpretable Yet Unfaithful

jacob_drori9 Feb 2026 23:25 UTC

136 points

5 comments8 min readLW link

Gwern’s 2025 Inkhaven Writing Interview

gwern9 Feb 2026 22:11 UTC

49 points

2 comments31 min readLW link

(gwern.net)

Claude Opus 4.6: System Card Part 1: Mundane Alignment and Model Welfare

Zvi9 Feb 2026 21:30 UTC

36 points

5 comments26 min readLW link

(thezvi.wordpress.com)

Closure

Vadim Golub9 Feb 2026 21:17 UTC

3 points

0 comments2 min readLW link

Aurelius: Proposing Alignment as an Emergent Property

Austin McCaffrey9 Feb 2026 20:13 UTC

−5 points

0 comments1 min readLW link

(github.com)

Distributed vs centralized agents

Richard_Ngo9 Feb 2026 20:06 UTC

51 points

9 comments1 min readLW link

Stone Age Billionaire Can’t Words Good

Eneasz9 Feb 2026 18:51 UTC

169 points

95 comments12 min readLW link

(deathisbad.substack.com)

Do Models Continue Misaligned Actions? [eval]

Jordan Taylor9 Feb 2026 16:59 UTC

76 points

12 comments11 min readLW link

the extraordinary as mundane

Derek DeHart9 Feb 2026 16:26 UTC

3 points

2 comments5 min readLW link

(dehart.substack.com)

Large Language Models Live in Time

Eleni Angelou9 Feb 2026 15:08 UTC

20 points

2 comments4 min readLW link

Sympathy for the Model, or, Welfare Concerns as Takeover Risk

J Bostock9 Feb 2026 14:19 UTC

42 points

37 comments3 min readLW link

Opus 4.6 Reasoning Doesn’t Verbalize Alignment Faking, but Behavior Persists

Daan Henselmans, Arno Libert and LennardZ

9 Feb 2026 12:55 UTC

118 points

13 comments8 min readLW link

Does an AI Society Need an Immune System? Accepting Yampolskiy’s Impossibility Results

Hiroshi Yamakawa9 Feb 2026 12:32 UTC

13 points

0 comments10 min readLW link

Can Hardware Save Us from Software?

Alvin Ånestrand9 Feb 2026 11:57 UTC

23 points

2 comments12 min readLW link

(forecastingaifutures.substack.com)

Complexity Science as Bridge to Eastern Philosophy

pchvykov9 Feb 2026 10:40 UTC

1 point

2 comments2 min readLW link

Design sketches for a more sensible world

owencb, Lizka, Oliver Sourbut and rosehadshar

9 Feb 2026 10:22 UTC

26 points

2 comments4 min readLW link

(www.forethought.org)