A rel­a­tively brief ex­pla­na­tion of Boltz­mann Brains

Eliezer Yudkowsky16 May 2026 21:19 UTC
206 points
155 comments4 min readLW link

Bench­mark­ing Real Work

16 May 2026 20:43 UTC
30 points
2 comments4 min readLW link

Cri­tique Sys­tems, Not Reality

Morphism16 May 2026 19:11 UTC
5 points
1 comment25 min readLW link
(thothhermes.substack.com)

Try­ing to use NLAs to find out how Qwen 2.5 7B does multiplication

Hannes Thurnherr16 May 2026 19:05 UTC
23 points
4 comments6 min readLW link

A Year Late, Claude Fi­nally Beats Poké­mon

Julian Bradshaw16 May 2026 7:05 UTC
162 points
12 comments9 min readLW link

NLA Ver­bal­iza­tions on Au­ditBench: Llama 70B

Realmbird16 May 2026 5:25 UTC
10 points
0 comments3 min readLW link

An In­tro­duc­tion to Ex­em­plar Par­ti­tion­ing for Mechanis­tic Interpretability

Jessica Rumbelow16 May 2026 3:58 UTC
69 points
7 comments11 min readLW link
(www.leap-labs.com)

An Ar­gu­ment for Analogies

James Stephen Brown16 May 2026 2:21 UTC
11 points
0 comments3 min readLW link

In­crim­i­nat­ing mis­al­igned AI mod­els via distillation

15 May 2026 21:43 UTC
115 points
12 comments5 min readLW link

Crit­i­cal Think­ing as a Gym Schedule

Alrenous15 May 2026 20:49 UTC
0 points
4 comments3 min readLW link

Why I am not too wor­ried about AIpoca­lypse: Scott Alexan­der vs Ni­co­laus Copernicus

Shmi15 May 2026 20:31 UTC
7 points
15 comments2 min readLW link

Risk re­ports need to ad­dress de­ploy­ment-time spread of misalignment

Alex Mallen15 May 2026 18:20 UTC
64 points
1 comment5 min readLW link

Monthly Roundup #42: May 2026

Zvi15 May 2026 16:50 UTC
30 points
2 comments24 min readLW link
(thezvi.wordpress.com)

Mechanis­tic es­ti­ma­tion for ex­pec­ta­tions of ran­dom products

15 May 2026 16:50 UTC
50 points
0 comments5 min readLW link
(www.alignment.org)

Clar­ify­ing the Dar­wi­nian Honeymoon

Elias Schmied15 May 2026 16:23 UTC
20 points
6 comments3 min readLW link

An­nounc­ing the Cen­ter for Shared AI Prosperity

Dylan Matthews15 May 2026 12:57 UTC
39 points
13 comments2 min readLW link

MATS 9 Ret­ro­spec­tive & Advice

beyarkay (Boyd Kane)15 May 2026 12:30 UTC
199 points
11 comments18 min readLW link
(boydkane.com)

Data Qual­ity is Way Un­der­rated, and We Should Start Fund­ing It.

Osapinion15 May 2026 4:07 UTC
4 points
0 comments2 min readLW link
(substack.com)

Don’t be too Clever to Take Ob­vi­ous Ad­vice

Hide15 May 2026 3:01 UTC
95 points
26 comments2 min readLW link
(hidefromit.substack.com)

Some ob­ser­va­tions about NLA explanations

loops15 May 2026 2:15 UTC
21 points
0 comments3 min readLW link

The hard core of al­ign­ment (is ro­bus­tify­ing RL)

Cole Wyeth15 May 2026 1:02 UTC
39 points
12 comments13 min readLW link

Con­ver­gent Ab­strac­tion Hypothesis

Jan_Kulveit15 May 2026 0:04 UTC
122 points
20 comments6 min readLW link

Emma Baker on ADHD

koratkar14 May 2026 23:29 UTC
8 points
2 comments3 min readLW link
(emma00baker.substack.com)

De­sign­ing AI fac­tual claims for “easy ver­ifi­ca­tion”

Raemon14 May 2026 23:23 UTC
33 points
17 comments2 min readLW link

Au­to­mated Align­ment is Harder Than You Think

14 May 2026 22:01 UTC
143 points
6 comments3 min readLW link
(arxiv.org)

2B scor­ing model flags out-of-do­main mis­al­ign­ment, sug­gest­ing spe­cial­ist judges have po­ten­tial for audits

burnssa14 May 2026 20:00 UTC
8 points
0 comments6 min readLW link

The safe-to-dan­ger­ous shift is a fun­da­men­tal prob­lem for eval re­al­ism; but also for mea­sur­ing awareness

14 May 2026 17:05 UTC
59 points
3 comments3 min readLW link

AI #168: Not Lead­ing the Future

Zvi14 May 2026 14:10 UTC
38 points
2 comments45 min readLW link
(thezvi.wordpress.com)

Why En­sur­ing Flour­ish­ing Is Not About Alignment

ofpetro14 May 2026 6:24 UTC
5 points
6 comments35 min readLW link

In­ter­ven­ing on Sparse, An­chored Concepts

Sandy Fraser14 May 2026 4:35 UTC
24 points
3 comments10 min readLW link

Al­gorith­mic Perfection

zw514 May 2026 3:44 UTC
5 points
1 comment2 min readLW link

Models find­ing soft­ware vuln­er­a­bil­ities is not the pri­mary source of cy­ber­se­cu­rity risk

lc14 May 2026 3:39 UTC
310 points
24 comments2 min readLW link

Claude is Now Align­ment-Pretrained

RogerDearnaley13 May 2026 23:19 UTC
87 points
9 comments1 min readLW link
(www.anthropic.com)

MATS Au­tumn 2026 Fel­low­ship Ap­pli­ca­tions Now Open—Ap­ply by June 7

13 May 2026 21:40 UTC
21 points
0 comments2 min readLW link

Build­ing Connections

13 May 2026 20:27 UTC
8 points
0 comments5 min readLW link

A lack of in­tro­spec­tive abil­ity is not a lack of cor­rigi­bil­ity

lc13 May 2026 20:23 UTC
26 points
3 comments1 min readLW link

Cy­ber Lack of Se­cu­rity and AI Governance

Zvi13 May 2026 20:20 UTC
41 points
1 comment16 min readLW link
(thezvi.wordpress.com)

Stick­i­ness in AI Be­hav­ioral Design

James_T13 May 2026 19:55 UTC
10 points
0 comments14 min readLW link
(www.forethought.org)

Pre­dict­ing Rare LLM Failures with 30× Fewer Rollouts

13 May 2026 17:53 UTC
55 points
3 comments5 min readLW link

Most “in­ner work” looks like en­ter­tain­ment.

Chris Lakin13 May 2026 17:51 UTC
48 points
10 comments2 min readLW link

A Re­search Agenda for Se­cret Loyalties

13 May 2026 17:34 UTC
35 points
3 comments3 min readLW link

Apollo Up­date May 2026

Marius Hobbhahn13 May 2026 16:43 UTC
48 points
0 comments1 min readLW link
(www.apolloresearch.ai)

The case for fine-grained track­ing of com­pute for AI

13 May 2026 16:00 UTC
36 points
17 comments9 min readLW link
(forum.effectivealtruism.org)

Vibe Ex­cel and the Fu­ture of White-Col­lar Work

ykevinzhang13 May 2026 15:39 UTC
13 points
5 comments6 min readLW link

“Com­mu­nity or­ga­nizer” is a dou­ble oxymoron

jchan13 May 2026 15:10 UTC
5 points
13 comments5 min readLW link

Vot­ers are sur­pris­ingly open to talk­ing about AI risk

less_raichu13 May 2026 14:08 UTC
116 points
11 comments3 min readLW link

Civ­i­liza­tion as a tower of holes

Joe Rogero13 May 2026 13:48 UTC
24 points
3 comments4 min readLW link
(subatomicarticles.com)

Ap­pli­ca­tions Open for Im­pact Ac­cel­er­a­tor Program

High Impact Professionals13 May 2026 8:35 UTC
6 points
0 comments1 min readLW link

Epistemic Im­mun­ode­pres­sion in the Age of AI

Tuyen Tran13 May 2026 5:49 UTC
15 points
5 comments2 min readLW link

Lorxus Does Bud­get Inkhaven Again: 4/​29, 4/​30, High­lights, Postmortem

Lorxus13 May 2026 1:37 UTC
15 points
0 comments3 min readLW link
(tiled-with-pentagons.blogspot.com)