My jour­ney to the microwave al­ter­nate timeline

Malmesbury10 Feb 2026 17:59 UTC
782 points
58 comments10 min readLW link

Stress-Test­ing Align­ment Au­dits With Prompt-Level Strate­gic Deception

10 Feb 2026 17:29 UTC
16 points
0 comments1 min readLW link
(arxiv.org)

Heuris­tics for lab robotics, and where its fu­ture may go

Abhishaike Mahajan10 Feb 2026 17:13 UTC
79 points
4 comments28 min readLW link
(www.owlposting.com)

On Meta-Level Ad­ver­sar­ial Eval­u­a­tions of (White-Box) Align­ment Auditing

Oliver Daniels10 Feb 2026 17:06 UTC
27 points
5 comments3 min readLW link

LLMs Views on Philos­o­phy 2026

JonathanErhardt10 Feb 2026 16:12 UTC
35 points
3 comments1 min readLW link

Claude Opus 4.6: Sys­tem Card Part 2: Fron­tier Alignment

Zvi10 Feb 2026 16:10 UTC
46 points
0 comments18 min readLW link
(thezvi.wordpress.com)

Cop­ing with Deconversion

Benjamin Hendricks10 Feb 2026 13:26 UTC
21 points
22 comments1 min readLW link

“Re­cur­sive Self-Im­prove­ment” Is Three Differ­ent Things

Ihor Kendiukhov10 Feb 2026 12:49 UTC
25 points
6 comments2 min readLW link

SAE Fea­ture Match­mak­ing (Layer-to-Layer)

Mitali M10 Feb 2026 4:32 UTC
9 points
0 comments1 min readLW link

Mon­day AI Radar #12

Against Moloch10 Feb 2026 4:28 UTC
16 points
1 comment7 min readLW link
(againstmoloch.com)

End­ing Park­ing Space Saving

jefftk10 Feb 2026 2:30 UTC
26 points
4 comments2 min readLW link
(www.jefftk.com)

[Question] Should we con­sider Meta to be a crim­i­nal en­ter­prise?

ChristianKl10 Feb 2026 2:10 UTC
43 points
23 comments1 min readLW link

[Question] OK, what’s the differ­ence be­tween co­her­ence and rep­re­sen­ta­tion the­o­rems?

Algon10 Feb 2026 0:45 UTC
15 points
7 comments2 min readLW link

In­tro­spec­tive In­ter­pretabil­ity: a Defi­ni­tion, Mo­ti­va­tion, and Open Problems

Belinda Li9 Feb 2026 23:53 UTC
10 points
0 comments13 min readLW link

Job List­ing (Closed): CBAI Oper­a­tions Associate

9 Feb 2026 23:36 UTC
1 point
0 comments1 min readLW link

Weight-Sparse Cir­cuits May Be In­ter­pretable Yet Unfaithful

jacob_drori9 Feb 2026 23:25 UTC
136 points
5 comments8 min readLW link

Gw­ern’s 2025 Inkhaven Writ­ing Interview

gwern9 Feb 2026 22:11 UTC
49 points
2 comments31 min readLW link
(gwern.net)

Claude Opus 4.6: Sys­tem Card Part 1: Mun­dane Align­ment and Model Welfare

Zvi9 Feb 2026 21:30 UTC
36 points
5 comments26 min readLW link
(thezvi.wordpress.com)

Closure

Vadim Golub9 Feb 2026 21:17 UTC
3 points
0 comments2 min readLW link

Aure­lius: Propos­ing Align­ment as an Emer­gent Property

Austin McCaffrey9 Feb 2026 20:13 UTC
−5 points
0 comments1 min readLW link
(github.com)

Distributed vs cen­tral­ized agents

Richard_Ngo9 Feb 2026 20:06 UTC
51 points
9 comments1 min readLW link

Stone Age Billion­aire Can’t Words Good

Eneasz9 Feb 2026 18:51 UTC
169 points
95 comments12 min readLW link
(deathisbad.substack.com)

Do Models Con­tinue Misal­igned Ac­tions? [eval]

Jordan Taylor9 Feb 2026 16:59 UTC
76 points
12 comments11 min readLW link

the ex­traor­di­nary as mundane

Derek DeHart9 Feb 2026 16:26 UTC
3 points
2 comments5 min readLW link
(dehart.substack.com)

Large Lan­guage Models Live in Time

Eleni Angelou9 Feb 2026 15:08 UTC
20 points
2 comments4 min readLW link

Sym­pa­thy for the Model, or, Welfare Con­cerns as Takeover Risk

J Bostock9 Feb 2026 14:19 UTC
42 points
37 comments3 min readLW link

Opus 4.6 Rea­son­ing Doesn’t Ver­bal­ize Align­ment Fak­ing, but Be­hav­ior Persists

9 Feb 2026 12:55 UTC
118 points
13 comments8 min readLW link

Does an AI So­ciety Need an Im­mune Sys­tem? Ac­cept­ing Yam­polskiy’s Im­pos­si­bil­ity Results

Hiroshi Yamakawa9 Feb 2026 12:32 UTC
13 points
0 comments10 min readLW link

Can Hard­ware Save Us from Soft­ware?

Alvin Ånestrand9 Feb 2026 11:57 UTC
23 points
2 comments12 min readLW link
(forecastingaifutures.substack.com)

Com­plex­ity Science as Bridge to Eastern Philosophy

pchvykov9 Feb 2026 10:40 UTC
1 point
2 comments2 min readLW link

De­sign sketches for a more sen­si­ble world

9 Feb 2026 10:22 UTC
26 points
2 comments4 min readLW link
(www.forethought.org)

De­sign sketches for an­gels-on-the-shoulder

9 Feb 2026 9:52 UTC
23 points
0 comments2 min readLW link
(www.forethought.org)

Model In­tegrity and Character

Oliver Klingefjord9 Feb 2026 8:15 UTC
12 points
3 comments6 min readLW link

Eleven Prac­ti­cal Ways to Pre­pare for AGI

John-Clark Levin9 Feb 2026 7:57 UTC
24 points
16 comments5 min readLW link

The differ­ence in risk/​re­ward for Hu­man­ity as a su­per-or­ganism vs. as a col­lec­tion of individuals

ZhanRocks9 Feb 2026 7:53 UTC
1 point
0 comments1 min readLW link

An­swer in your head

throwaway8355439 Feb 2026 7:41 UTC
16 points
2 comments3 min readLW link

Eval­u­at­ing Con­flict of Interest

warner9 Feb 2026 7:30 UTC
1 point
0 comments2 min readLW link

Three vi­sions for diffuse control

Alek Westover9 Feb 2026 6:41 UTC
8 points
0 comments3 min readLW link

Ob­ser­va­tions and Complexity

Ape in the coat9 Feb 2026 6:13 UTC
9 points
2 comments3 min readLW link
(apeinthecoat102771.substack.com)

A Perfect Re­s­ur­rec­tion

MarkelKori9 Feb 2026 1:33 UTC
9 points
16 comments3 min readLW link

Em­pa­thy Has Out­worn Its Place in Politics

Character#27368 Feb 2026 23:22 UTC
−26 points
8 comments4 min readLW link

The Two-Board Prob­lem: Train­ing En­vi­ron­ment for Re­search Agents

Valerii K.8 Feb 2026 23:13 UTC
4 points
0 comments9 min readLW link

Join My New Move­ment for the Post-AI World

E.G. Blee-Goldman8 Feb 2026 22:18 UTC
0 points
0 comments7 min readLW link

Dona­tions, The Fifth Year

jenn8 Feb 2026 22:04 UTC
39 points
0 comments4 min readLW link
(www.jenn.site)

Every Mea­sure­ment Has a Scale

CarolusRenniusVitellius8 Feb 2026 20:07 UTC
17 points
4 comments4 min readLW link
(charlesr-w.github.io)

UtopiaBench

nielsrolf8 Feb 2026 18:19 UTC
67 points
10 comments1 min readLW link

Smokey, This is not ’Nam Or: [Already] over the [red] line!

Davidmanheim8 Feb 2026 12:24 UTC
110 points
22 comments4 min readLW link

The op­ti­mal age to freeze eggs is 19

GeneSmith8 Feb 2026 9:44 UTC
195 points
48 comments6 min readLW link

It Is Rea­son­able To Re­search How To Use Model In­ter­nals In Training

Neel Nanda8 Feb 2026 3:44 UTC
103 points
15 comments4 min readLW link

Claude’s Bad Primer Fanfic

abramdemski8 Feb 2026 0:39 UTC
24 points
12 comments54 min readLW link