Thoughts on Claude Fable’s silent safeguards

Andy Arditi10 Jun 2026 23:35 UTC
51 points
20 comments10 min readLW link

Notes on Algorithms

Menotim10 Jun 2026 23:28 UTC
7 points
0 comments25 min readLW link

[Question] Fuel Cri­sis: Si­tu­a­tion Model­ing Thread

Nicholas Kross10 Jun 2026 21:59 UTC
8 points
7 comments1 min readLW link

[Question] Fuel Cri­sis: Jus­tified Prac­ti­cal Ad­vice Thread

Nicholas Kross10 Jun 2026 21:59 UTC
14 points
0 comments1 min readLW link

Sol­song Chord Updates

jefftk10 Jun 2026 21:00 UTC
10 points
0 comments1 min readLW link
(www.jefftk.com)

Dario Amodei—Policy on the AI Exponential

DW1110 Jun 2026 20:56 UTC
22 points
0 comments1 min readLW link

An­thropic did not call for a pause on AI

10 Jun 2026 20:02 UTC
80 points
5 comments5 min readLW link
(controlai.news)

Es­ti­mat­ing No-CoT Task-Com­ple­tion Time Hori­zons of Fron­tier AI Models

10 Jun 2026 17:58 UTC
240 points
20 comments4 min readLW link

Th­ese Three Thaumata

chaosmage10 Jun 2026 16:42 UTC
11 points
0 comments1 min readLW link

Se­quent: scale and au­toma­tion for higher con­fi­dence in alignment

10 Jun 2026 15:37 UTC
276 points
2 comments11 min readLW link
(sequent.org)

You Can Catch Sleeper Agents by Teach­ing Another Model to Imi­tate Them

RobinHa10 Jun 2026 15:21 UTC
65 points
5 comments9 min readLW link
(robinhaselhorst.com)

I Started an AI Safety Re­search Org and Think Th­ese 7 Things Matter

Alfie Lamerton10 Jun 2026 14:54 UTC
20 points
0 comments5 min readLW link

Phonies

IanWS10 Jun 2026 14:17 UTC
10 points
0 comments2 min readLW link
(write.ianwsperber.com)

Ma­chinic Psy­chophar­ma­col­ogy: Do LLMs Self-Med­i­cate?

10 Jun 2026 14:15 UTC
124 points
11 comments23 min readLW link

I didn’t see any METR graph ex­trap­o­la­tions so here.

Vermillion10 Jun 2026 12:50 UTC
15 points
2 comments1 min readLW link

ML4Good Sum­mer 2026 Boot­camps - Ap­pli­ca­tions Open!

Jack_S10 Jun 2026 11:07 UTC
3 points
0 comments2 min readLW link

Trac­ing Eval-Aware­ness Emer­gence Through Train­ing of OLMo 3

10 Jun 2026 10:13 UTC
43 points
6 comments6 min readLW link

The Three Filters: Why Al­most Every Plan to Sur­vive ASI Fails Miserably

Alex Amadori10 Jun 2026 9:44 UTC
73 points
26 comments16 min readLW link
(alexamadori.substack.com)

Three types of model or­ganism

Francis Rhys Ward10 Jun 2026 8:50 UTC
45 points
7 comments2 min readLW link

Even “illeg­ible” Mythos rea­son­ing traces seem pretty legible

faul_sname10 Jun 2026 8:49 UTC
157 points
23 comments2 min readLW link

MythOS—The Rise of AGI

Byron Lee10 Jun 2026 6:06 UTC
−19 points
0 comments4 min readLW link

Un­der Violet

Hide10 Jun 2026 1:30 UTC
4 points
0 comments10 min readLW link
(hidefromit.substack.com)

LessOn­line 2026

nomagicpill9 Jun 2026 23:24 UTC
3 points
0 comments5 min readLW link
(nomagicpill.substack.com)

“Pro­gram­mer Science Fic­tion: My case for a new sub-genre”, Sam T. Oates 2026

gwern9 Jun 2026 23:23 UTC
47 points
10 comments1 min readLW link
(stoates.substack.com)

The Di­su­til­ity of FDT: on Utility Func­tions and Vot­ing, In­sights from Be­hav­ioral Eco­nomics and De­ci­sion Theory

DanielW9 Jun 2026 23:13 UTC
5 points
3 comments8 min readLW link

Three Labs With a Plan and A Memorandum

Zvi9 Jun 2026 22:40 UTC
45 points
0 comments12 min readLW link
(thezvi.wordpress.com)

Harm­ful­ness Direc­tions in OLMo

9 Jun 2026 22:31 UTC
20 points
0 comments11 min readLW link

“Self-Con­trol” Is A (Neu­rolog­i­cal) Type Error

Elliot Callender9 Jun 2026 21:34 UTC
−6 points
0 comments1 min readLW link

Towards a For­mal Scien­tific Epistemology

Richard_Ngo9 Jun 2026 20:31 UTC
75 points
9 comments7 min readLW link
(www.mindthefuture.info)

Some In­ter­est­ing Papers on RLVR

CarolusRenniusVitellius9 Jun 2026 19:00 UTC
20 points
5 comments4 min readLW link

A Mike’s-Eye View of ARC’s Research

Mikewins9 Jun 2026 18:30 UTC
64 points
1 comment11 min readLW link
(www.alignment.org)

An LLM Flagged My Paper About LLMs Flag­ging Things.

Failfinder709 Jun 2026 18:00 UTC
5 points
0 comments2 min readLW link

The Skep­tic, the Bayesian, Em­piri­cism and Claims to Know:

DanielW9 Jun 2026 17:52 UTC
4 points
4 comments4 min readLW link

Claude Fable 5 and Mythos 5 [Linkpost]

fluxxrider9 Jun 2026 17:19 UTC
42 points
10 comments1 min readLW link

5 Things I Learned About Peo­ple From Do­ing Stand-Up Comedy

Luise Woehlke9 Jun 2026 15:52 UTC
−4 points
5 comments2 min readLW link
(open.substack.com)

The Machines Lack Honour

Raymond Douglas9 Jun 2026 15:30 UTC
169 points
21 comments12 min readLW link

High Dy­namic Range DIY Air Testing

jefftk9 Jun 2026 15:00 UTC
13 points
0 comments4 min readLW link
(www.jefftk.com)

AI Su­per PAC tracker

Mikhail Samin9 Jun 2026 14:57 UTC
26 points
0 comments1 min readLW link
(electhumans.com)

[Linkpost] Evals for “SPI-in­com­pat­i­ble” be­hav­ior & rea­son­ing: Guide to ini­tial research

Anthony DiGiovanni9 Jun 2026 13:44 UTC
23 points
0 comments1 min readLW link
(docs.google.com)

Sub­ver­sion-Re­sis­tance for Free from For­mal Verification

Adam Chlipala9 Jun 2026 12:01 UTC
7 points
0 comments7 min readLW link

LLMs and al­most good code

kqr9 Jun 2026 7:21 UTC
33 points
9 comments3 min readLW link
(entropicthoughts.com)

On Slop

Jan9 Jun 2026 1:08 UTC
32 points
4 comments7 min readLW link
(universalprior.substack.com)

How to build a can­cer vac­cine, and whether they will work this time

Abhishaike Mahajan8 Jun 2026 20:45 UTC
58 points
9 comments25 min readLW link
(www.owlposting.com)

Effi­cient trade­offs and the safety-use­ful­ness trade­off model

Buck8 Jun 2026 20:28 UTC
42 points
1 comment8 min readLW link

Ac­cel­er­ated Skill Learn­ing via Dream Eng­ineer­ing and Biofeedback

Elliot Callender8 Jun 2026 20:08 UTC
5 points
2 comments3 min readLW link

How valuable are weak AI safety reg­u­la­tions?

MichaelDickens8 Jun 2026 18:24 UTC
28 points
0 comments6 min readLW link

How to re­duce ca­pa­bil­ity degra­da­tion from off-model SFT

8 Jun 2026 16:24 UTC
21 points
0 comments3 min readLW link

The Next Swan: Frank Ram­sey, Vari­able Hy­po­thet­i­cals, and the Bet on Induction

Ramseyian8 Jun 2026 12:01 UTC
4 points
0 comments18 min readLW link

Cover­age-driven al­ign­ment—What ‘Teach­ing Claude Why’ can bor­row from AV verification

Yoav Hollander8 Jun 2026 11:42 UTC
16 points
4 comments14 min readLW link
(blog.foretellix.com)

Bun’s Mi­gra­tion from Zig to Rust as a Po­ten­tial Case Study for Grad­ual Disempowerment

Sayhan Yalvaçer8 Jun 2026 7:06 UTC
96 points
8 comments3 min readLW link