Long-term risks from ide­olog­i­cal fanaticism

12 Feb 2026 23:26 UTC
99 points
12 comments84 min readLW link

(Re)Dis­cov­er­ing Nat­u­ral Laws

Margot12 Feb 2026 21:45 UTC
13 points
0 comments17 min readLW link

An On­tol­ogy of Rep­re­sen­ta­tions: Limits of Universality

Margot12 Feb 2026 21:43 UTC
23 points
1 comment39 min readLW link

A Closer Look at the “So­cieties of Thought” Paper

Against Moloch12 Feb 2026 21:38 UTC
10 points
0 comments3 min readLW link
(againstmoloch.com)

mod­els have some pretty funny at­trac­tor states

12 Feb 2026 21:14 UTC
275 points
38 comments18 min readLW link

Stay in your hu­man loop

benjamin ar12 Feb 2026 21:05 UTC
22 points
0 comments5 min readLW link
(bjar.substack.com)

The case for in­dus­trial evals

12 Feb 2026 20:45 UTC
16 points
0 comments23 min readLW link

Mul­ti­verse sam­pling assumption

avturchin12 Feb 2026 19:59 UTC
12 points
0 comments5 min readLW link

What We Learned from Briefing 140+ Law­mak­ers on the Threat from AI

leticiagarcia12 Feb 2026 19:53 UTC
174 points
7 comments14 min readLW link
(substack.com)

Paper: Prompt Op­ti­miza­tion Makes Misal­ign­ment Legible

12 Feb 2026 19:45 UTC
63 points
8 comments8 min readLW link

Claude’s Constitution

PeterMcCluskey12 Feb 2026 19:44 UTC
15 points
4 comments6 min readLW link
(bayesianinvestor.com)

Hu­man-like metacog­ni­tive skills will re­duce LLM slop and aid al­ign­ment and capabilities

Seth Herd12 Feb 2026 19:38 UTC
48 points
16 comments18 min readLW link

Good AI Epistemics as an Offramp from the In­tel­li­gence Explosion

Ben Goldhaber12 Feb 2026 19:18 UTC
23 points
2 comments3 min readLW link

How Se­cret Loy­alty Differs from Stan­dard Back­door Threats

Joe Kwon12 Feb 2026 18:48 UTC
23 points
4 comments12 min readLW link

You get about.… how many words ex­actly?

Raemon12 Feb 2026 18:06 UTC
21 points
1 comment7 min readLW link

Ba­sic Leg­i­bil­ity Pro­to­cols Im­prove Trusted Monitoring

12 Feb 2026 17:50 UTC
8 points
4 comments11 min readLW link

A re­search agenda for the fi­nal year

Mitchell_Porter12 Feb 2026 17:24 UTC
13 points
22 comments3 min readLW link

Poly­se­man­tic­ity is a Misnomer

Shiva's Right Foot12 Feb 2026 17:22 UTC
11 points
0 comments3 min readLW link

Op­ti­mal Timing for Su­per­in­tel­li­gence: Mun­dane Con­sid­er­a­tions for Ex­ist­ing People

Nick Bostrom12 Feb 2026 17:06 UTC
49 points
89 comments72 min readLW link

How do we (more) safely defer to AIs?

12 Feb 2026 16:55 UTC
83 points
5 comments72 min readLW link

A Con­cep­tual Frame­work for Ex­plo­ra­tion Hacking

12 Feb 2026 16:33 UTC
26 points
2 comments9 min readLW link

AI #155: Wel­come to Re­cur­sive Self-Improvement

Zvi12 Feb 2026 16:10 UTC
52 points
5 comments56 min readLW link
(thezvi.wordpress.com)

The Fa­cade of AI Safety Will Crumble

Liron12 Feb 2026 15:57 UTC
36 points
11 comments4 min readLW link
(doomdebates.com)

The his­tory of light

Kotlopou12 Feb 2026 14:16 UTC
16 points
0 comments1 min readLW link
(beatingthehydra.substack.com)

Three Wor­lds Col­lide as­sumes cal­ibra­tion is solved

Vyacheslav Ladischenski (Slava)12 Feb 2026 4:28 UTC
7 points
1 comment3 min readLW link

Re­search note: A sim­pler AI timelines model pre­dicts 99% AI R&D au­toma­tion in ~2032

Thomas Kwa12 Feb 2026 0:13 UTC
69 points
15 comments8 min readLW link
(metr.org)

Time­less Engineering

Jack Bradshaw11 Feb 2026 23:53 UTC
−14 points
0 comments5 min readLW link

[Paper] How does in­for­ma­tion ac­cess af­fect LLM mon­i­tors’ abil­ity to de­tect sab­o­tage?

11 Feb 2026 21:25 UTC
26 points
0 comments6 min readLW link

Claude Opus 4.6 Es­ca­lates Things Quickly

Zvi11 Feb 2026 21:20 UTC
51 points
0 comments34 min readLW link
(thezvi.wordpress.com)

Where Will Call Cen­ter Work­ers Go?

loic11 Feb 2026 20:44 UTC
19 points
2 comments4 min readLW link

Dist­in­guish be­tween in­fer­ence scal­ing and “larger tasks use more com­pute”

ryan_greenblatt11 Feb 2026 18:37 UTC
87 points
5 comments2 min readLW link

Mon­i­tor Jailbreak­ing: Evad­ing Chain-of-Thought Mon­i­tor­ing Without En­coded Reasoning

Wuschel Schulz11 Feb 2026 17:18 UTC
61 points
17 comments5 min readLW link

[Hiring] Prin­cipia Re­search Fellows

11 Feb 2026 16:30 UTC
35 points
1 comment3 min readLW link

The SaaS Blood­bath: the Op­por­tu­ni­ties and Per­ils for Investors

ykevinzhang11 Feb 2026 16:17 UTC
0 points
0 comments4 min readLW link

On Re­solv­ing the Great Matter

Gordon Seidoh Worley11 Feb 2026 15:30 UTC
11 points
7 comments3 min readLW link
(www.uncertainupdates.com)

Is a con­sti­tu­tion a “no­ble lie”?

SpectrumDT11 Feb 2026 15:08 UTC
4 points
10 comments2 min readLW link

Jevons Burnout

Kemp11 Feb 2026 13:29 UTC
−3 points
1 comment1 min readLW link

Strate­gic aware­ness tools: de­sign sketches

11 Feb 2026 12:28 UTC
18 points
2 comments1 min readLW link
(www.forethought.org)

In­tro­spec­tive RSI vs Ex­tro­spec­tive RSI

Cleo Nardo11 Feb 2026 11:54 UTC
10 points
6 comments2 min readLW link

[Question] What con­crete mechanisms could lead to AI mod­els hav­ing open-ended goals?

Jemal Young11 Feb 2026 9:08 UTC
10 points
4 comments1 min readLW link

Is Every­thing Con­nected? A McLuhan Thought Experiment

R0sberg11 Feb 2026 6:04 UTC
2 points
0 comments6 min readLW link

De­sign­ing Pre­dic­tion Markets

ToasterLightning11 Feb 2026 5:38 UTC
58 points
6 comments7 min readLW link

punc­tilio: the best text prettifier

TurnTrout11 Feb 2026 4:49 UTC
24 points
0 comments5 min readLW link
(github.com)

LessOn­line 2026: June 5-7, Berkeley, CA (save the date)

Ruby11 Feb 2026 0:15 UTC
56 points
7 comments1 min readLW link
(Less.Online)

Build­ing a Regex Eng­ine with a team of par­allel Claudes

kian11 Feb 2026 0:08 UTC
2 points
2 comments1 min readLW link
(kiankyars.github.io)

My jour­ney to the microwave al­ter­nate timeline

Malmesbury10 Feb 2026 17:59 UTC
782 points
58 comments10 min readLW link

Stress-Test­ing Align­ment Au­dits With Prompt-Level Strate­gic Deception

10 Feb 2026 17:29 UTC
16 points
0 comments1 min readLW link
(arxiv.org)

Heuris­tics for lab robotics, and where its fu­ture may go

Abhishaike Mahajan10 Feb 2026 17:13 UTC
79 points
4 comments28 min readLW link
(www.owlposting.com)

On Meta-Level Ad­ver­sar­ial Eval­u­a­tions of (White-Box) Align­ment Auditing

Oliver Daniels10 Feb 2026 17:06 UTC
27 points
5 comments3 min readLW link

LLMs Views on Philos­o­phy 2026

JonathanErhardt10 Feb 2026 16:12 UTC
35 points
3 comments1 min readLW link