12 Feb 2026 23:26 UTC

99 points

12 comments84 min readLW link

(Re)Discovering Natural Laws

Margot12 Feb 2026 21:45 UTC

13 points

0 comments17 min readLW link

An Ontology of Representations: Limits of Universality

Margot12 Feb 2026 21:43 UTC

23 points

1 comment39 min readLW link

A Closer Look at the “Societies of Thought” Paper

Against Moloch12 Feb 2026 21:38 UTC

10 points

0 comments3 min readLW link

(againstmoloch.com)

models have some pretty funny attractor states

aryaj, Senthooran Rajamanoharan and Neel Nanda

12 Feb 2026 21:14 UTC

275 points

38 comments18 min readLW link

Stay in your human loop

benjamin ar12 Feb 2026 21:05 UTC

22 points

0 comments5 min readLW link

(bjar.substack.com)

The case for industrial evals

Andre Assis and Monte M

12 Feb 2026 20:45 UTC

16 points

0 comments23 min readLW link

Multiverse sampling assumption

avturchin12 Feb 2026 19:59 UTC

12 points

0 comments5 min readLW link

What We Learned from Briefing 140+ Lawmakers on the Threat from AI

leticiagarcia12 Feb 2026 19:53 UTC

174 points

7 comments14 min readLW link

(substack.com)

Paper: Prompt Optimization Makes Misalignment Legible

Caleb Biddulph and micahcarroll

12 Feb 2026 19:45 UTC

63 points

8 comments8 min readLW link

Claude’s Constitution

PeterMcCluskey12 Feb 2026 19:44 UTC

15 points

4 comments6 min readLW link

(bayesianinvestor.com)

Human-like metacognitive skills will reduce LLM slop and aid alignment and capabilities

Seth Herd12 Feb 2026 19:38 UTC

48 points

16 comments18 min readLW link

Good AI Epistemics as an Offramp from the Intelligence Explosion

Ben Goldhaber12 Feb 2026 19:18 UTC

23 points

2 comments3 min readLW link

How Secret Loyalty Differs from Standard Backdoor Threats

Joe Kwon12 Feb 2026 18:48 UTC

23 points

4 comments12 min readLW link

You get about.… how many words exactly?

Raemon12 Feb 2026 18:06 UTC

21 points

1 comment7 min readLW link

Basic Legibility Protocols Improve Trusted Monitoring

SebastianP and theashwinner

12 Feb 2026 17:50 UTC

8 points

4 comments11 min readLW link

A research agenda for the final year

Mitchell_Porter12 Feb 2026 17:24 UTC

13 points

22 comments3 min readLW link

Polysemanticity is a Misnomer

Shiva's Right Foot12 Feb 2026 17:22 UTC

11 points

0 comments3 min readLW link

Optimal Timing for Superintelligence: Mundane Considerations for Existing People

Nick Bostrom12 Feb 2026 17:06 UTC

49 points

89 comments72 min readLW link

How do we (more) safely defer to AIs?

ryan_greenblatt and Julian Stastny

12 Feb 2026 16:55 UTC

83 points

5 comments72 min readLW link

A Conceptual Framework for Exploration Hacking

Joschka Braun, Eyon Jang and Damon Falck

12 Feb 2026 16:33 UTC

26 points

2 comments9 min readLW link

AI #155: Welcome to Recursive Self-Improvement

Zvi12 Feb 2026 16:10 UTC

52 points

5 comments56 min readLW link

(thezvi.wordpress.com)

The Facade of AI Safety Will Crumble

Liron12 Feb 2026 15:57 UTC

36 points

11 comments4 min readLW link

(doomdebates.com)

The history of light

Kotlopou12 Feb 2026 14:16 UTC

16 points

0 comments1 min readLW link

(beatingthehydra.substack.com)

Three Worlds Collide assumes calibration is solved

Vyacheslav Ladischenski (Slava)12 Feb 2026 4:28 UTC

7 points

1 comment3 min readLW link

Research note: A simpler AI timelines model predicts 99% AI R&D automation in ~2032

Thomas Kwa12 Feb 2026 0:13 UTC

69 points

15 comments8 min readLW link

(metr.org)

Timeless Engineering

Jack Bradshaw11 Feb 2026 23:53 UTC

−14 points

0 comments5 min readLW link

[Paper] How does information access affect LLM monitors’ ability to detect sabotage?

Rauno Arike, Raja Moreno, RohanS, Shubhorup Biswas and Francis Rhys Ward

11 Feb 2026 21:25 UTC

26 points

0 comments6 min readLW link

Claude Opus 4.6 Escalates Things Quickly

Zvi11 Feb 2026 21:20 UTC

51 points

0 comments34 min readLW link

(thezvi.wordpress.com)

Where Will Call Center Workers Go?

loic11 Feb 2026 20:44 UTC

19 points

2 comments4 min readLW link

Distinguish between inference scaling and “larger tasks use more compute”

ryan_greenblatt11 Feb 2026 18:37 UTC

87 points

5 comments2 min readLW link

Monitor Jailbreaking: Evading Chain-of-Thought Monitoring Without Encoded Reasoning

Wuschel Schulz11 Feb 2026 17:18 UTC

61 points

17 comments5 min readLW link

[Hiring] Principia Research Fellows

Matthias Dellago and Jin Hwa Lee

11 Feb 2026 16:30 UTC

35 points

1 comment3 min readLW link

The SaaS Bloodbath: the Opportunities and Perils for Investors

ykevinzhang11 Feb 2026 16:17 UTC

0 points

0 comments4 min readLW link

On Resolving the Great Matter

Gordon Seidoh Worley11 Feb 2026 15:30 UTC

11 points

7 comments3 min readLW link

(www.uncertainupdates.com)

Is a constitution a “noble lie”?

SpectrumDT11 Feb 2026 15:08 UTC

4 points

10 comments2 min readLW link

Jevons Burnout

Kemp11 Feb 2026 13:29 UTC

−3 points

1 comment1 min readLW link

Strategic awareness tools: design sketches

rosehadshar, owencb, Lizka and Oliver Sourbut

11 Feb 2026 12:28 UTC

18 points

2 comments1 min readLW link

(www.forethought.org)

Introspective RSI vs Extrospective RSI

Cleo Nardo11 Feb 2026 11:54 UTC

10 points

6 comments2 min readLW link

[Question] What concrete mechanisms could lead to AI models having open-ended goals?

Jemal Young11 Feb 2026 9:08 UTC

10 points

4 comments1 min readLW link

Is Everything Connected? A McLuhan Thought Experiment

R0sberg11 Feb 2026 6:04 UTC

2 points

0 comments6 min readLW link

Designing Prediction Markets

ToasterLightning11 Feb 2026 5:38 UTC

58 points

6 comments7 min readLW link

punctilio: the best text prettifier

TurnTrout11 Feb 2026 4:49 UTC

24 points

0 comments5 min readLW link

(github.com)

LessOnline 2026: June 5-7, Berkeley, CA (save the date)

Ruby11 Feb 2026 0:15 UTC

56 points

7 comments1 min readLW link

(Less.Online)

Building a Regex Engine with a team of parallel Claudes

kian11 Feb 2026 0:08 UTC

2 points

2 comments1 min readLW link

(kiankyars.github.io)

My journey to the microwave alternate timeline

Malmesbury10 Feb 2026 17:59 UTC

782 points

58 comments10 min readLW link

Stress-Testing Alignment Audits With Prompt-Level Strategic Deception

Oliver Daniels, Perusha Moodley and David Lindner

10 Feb 2026 17:29 UTC

16 points

0 comments1 min readLW link

(arxiv.org)

Heuristics for lab robotics, and where its future may go

Abhishaike Mahajan10 Feb 2026 17:13 UTC

79 points

4 comments28 min readLW link

(www.owlposting.com)

On Meta-Level Adversarial Evaluations of (White-Box) Alignment Auditing

Oliver Daniels10 Feb 2026 17:06 UTC

27 points

5 comments3 min readLW link

LLMs Views on Philosophy 2026

JonathanErhardt10 Feb 2026 16:12 UTC

35 points

3 comments1 min readLW link