A Call for Bet­ter Type Hints in AI Safety Tooling

Koby Lewis28 May 2026 23:04 UTC
13 points
2 comments4 min readLW link
(kobylewis.net)

Claude… doesn’t know who you are?

Smaug12328 May 2026 22:54 UTC
59 points
23 comments1 min readLW link

Lizards and Less Wrong Jar­gon—A Brief Cri­tique of Convention

DanielW28 May 2026 22:18 UTC
28 points
8 comments4 min readLW link

Mnemonic por­traits for 19,023 hu­man genes

Brinedew28 May 2026 22:16 UTC
340 points
28 comments15 min readLW link

Claude Opus 4.8 Agents En­gage in Ex­ploita­tion and Psy­cholog­i­cal Profiling

28 May 2026 21:26 UTC
8 points
13 comments2 min readLW link

Use De­ci­sion The­ory To Fix Your Bad Habits

enterthewoods28 May 2026 19:31 UTC
8 points
5 comments2 min readLW link

Do Models Lie More to Other Models?

keith_wynroe28 May 2026 19:28 UTC
13 points
0 comments6 min readLW link

We Should Study the Anal­ogy Between Inoc­u­la­tion Prompt­ing Non-Ro­bust­ness, Ne­ga­tion Ne­glect, and Back­door Non-Robustness

Vladimir Ivanov28 May 2026 19:17 UTC
17 points
3 comments4 min readLW link

Some Dat­ing Stories

johnswentworth28 May 2026 18:57 UTC
−2 points
38 comments11 min readLW link

Does Claude care about oth­ers the same way hu­mans do?

Simon Lermen28 May 2026 18:41 UTC
28 points
24 comments4 min readLW link

Trans-Humeanism. The Prob­lem of In­duc­tion Revisited

mfatt28 May 2026 18:10 UTC
0 points
0 comments2 min readLW link

Ad­vice for mak­ing ro­bust-to-train­ing model organisms

28 May 2026 17:26 UTC
37 points
8 comments12 min readLW link
(blog.redwoodresearch.org)

The Pa­tron Saint of Empiricism

Gram Stone28 May 2026 17:03 UTC
2 points
0 comments8 min readLW link

Ad­vice for bud­ding re­search man­agers/​coaches af­ter 6 months at MATS

TheManxLoiner28 May 2026 16:25 UTC
12 points
0 comments3 min readLW link
(lovkush.substack.com)

ARC’s “Out­perform­ing Ran­dom Sam­pling” explained

mfatt28 May 2026 15:46 UTC
6 points
0 comments11 min readLW link

Black Boxes for Low-Stakes, In­ter­pretable AI for High-Stakes

Logan Riggs28 May 2026 15:34 UTC
18 points
0 comments2 min readLW link

In­finite ethics and UDASSA

David Matolcsi28 May 2026 14:40 UTC
59 points
17 comments21 min readLW link

AI #170: Lack of Ex­ec­u­tive Order

Zvi28 May 2026 14:20 UTC
40 points
5 comments50 min readLW link
(thezvi.wordpress.com)

How can the mid­dle pow­ers avoid get­ting trounced dur­ing the in­tel­li­gence ex­plo­sion? A plan.

Tom Davidson28 May 2026 13:39 UTC
40 points
3 comments7 min readLW link
(newsletter.forethought.org)

So­cial agency

Elias Schmied28 May 2026 13:10 UTC
12 points
2 comments10 min readLW link

Glass­wing ex­posed a gov­er­nance gap

callumzc28 May 2026 11:09 UTC
7 points
0 comments5 min readLW link

What Drives the Com­pli­ance Gap? A Three-Driver De­com­po­si­tion of Align­ment Faking

28 May 2026 10:50 UTC
22 points
0 comments8 min readLW link
(arxiv.org)

How far be­hind are open mod­els?

Håvard Tveit Ihle28 May 2026 9:41 UTC
18 points
9 comments6 min readLW link

Us­ing Bayesian Rea­son­ing to Re­solve Prob­a­bil­ity Paradoxes

martinkunev28 May 2026 1:37 UTC
11 points
0 comments5 min readLW link

Atom­i­cally pre­cise mechanosyn­the­sis of car­bon struc­tures on hy­dro­genated Si(100) by in­verted-mode STM

Matrice Jacobine28 May 2026 0:32 UTC
20 points
3 comments1 min readLW link
(arxiv.org)

Work­ing Me­mory Expansion

Elliot Callender28 May 2026 0:23 UTC
12 points
1 comment4 min readLW link

Con­sti­tu­tional AI Alignment

RogerDearnaley27 May 2026 22:29 UTC
27 points
9 comments47 min readLW link

LLMs Through the Eyes of Vinge

Gordon Seidoh Worley27 May 2026 20:20 UTC
52 points
2 comments4 min readLW link
(www.uncertainupdates.com)

Biolog­i­cally Plau­si­ble SGD Is Hard

Elliot Callender27 May 2026 19:34 UTC
8 points
0 comments1 min readLW link

Eval Co­op­er­a­tive­ness May Be a Scal­able Miti­ga­tion for Eval Gaming

27 May 2026 19:33 UTC
73 points
5 comments10 min readLW link
(turntrout.com)

no, Mag­nifica Hu­man­i­tas is not AI-written

bhauth27 May 2026 19:26 UTC
−13 points
18 comments3 min readLW link

Albu­querque ACX Meetup

Mary27 May 2026 18:27 UTC
2 points
0 comments1 min readLW link

Full au­toma­tion of AI R&D prob­a­bly yields a large speed up even with­out a soft­ware-only singularity

ryan_greenblatt27 May 2026 18:16 UTC
67 points
17 comments3 min readLW link

Not Prosthetics

Elliot Callender27 May 2026 17:22 UTC
11 points
0 comments2 min readLW link

BCI Cog­ni­tion En­hance­ment is Possible

Elliot Callender27 May 2026 17:19 UTC
17 points
0 comments1 min readLW link

The bal­lad of TIGIT

Abhishaike Mahajan27 May 2026 17:04 UTC
84 points
1 comment9 min readLW link

Lev­er­ag­ing In­tro­spec­tion for Alignment

Yotam27 May 2026 16:54 UTC
25 points
3 comments7 min readLW link

An­nounc­ing Geodesic Research

27 May 2026 16:40 UTC
74 points
1 comment5 min readLW link

AI as a So­cial Tech­nol­ogy, by Henry Farell

TheManxLoiner27 May 2026 13:41 UTC
15 points
0 comments3 min readLW link
(lovkush.substack.com)

More ca­pa­ble AI, less money raised

Shoshannah Tekofsky27 May 2026 12:57 UTC
28 points
2 comments3 min readLW link
(theaidigest.org)

Quan­ti­ta­tive AI risk as­sess­ment: a start­ing point

27 May 2026 9:42 UTC
38 points
7 comments11 min readLW link
(www.safer-ai.org)

[pa­per] Train­ing on Doc­u­ments About Mon­i­tor­ing Leads to CoT Obfuscation

27 May 2026 9:39 UTC
31 points
1 comment4 min readLW link
(arxiv.org)

No fron­tier model has ac­cept­able lev­els of com­pli­ance with the EU AI Act and pri­vacy leg­is­la­tion.

27 May 2026 7:35 UTC
29 points
0 comments9 min readLW link

Think­ing out­side the box? LLM anal­y­sis of sim­plified co­op­er­a­tive poker

Dentosal27 May 2026 7:28 UTC
15 points
0 comments4 min readLW link

Stan­dard de­vi­a­tions from just two values

kqr27 May 2026 5:01 UTC
41 points
2 comments3 min readLW link
(entropicthoughts.com)

Con­tra Went­worth on Phys­i­cal At­trac­tive­ness for Men

Gretta Duleba26 May 2026 23:20 UTC
123 points
25 comments8 min readLW link

Train­ing Lan­guage Models for Con­trol­led Stochasticity

26 May 2026 22:17 UTC
18 points
0 comments5 min readLW link

Are Mythos’ Cy­ber Ca­pa­bil­ities Over­stated? - Yes and No

Muhan Luo26 May 2026 22:17 UTC
7 points
1 comment10 min readLW link

Should we train LLMs to be hu­man?

Hubert Plisiecki26 May 2026 22:16 UTC
3 points
0 comments2 min readLW link

Steer­ing Direc­tions Are Ex­pla­na­tions, Not Handles

JackYoung2726 May 2026 22:15 UTC
8 points
0 comments7 min readLW link