Con­struct val­idity of Claude Opus 4.8′s Sys­tem Card – A com­men­tary

Maria Federica Martino Lena 11 Jun 2026 23:33 UTC
8 points
0 comments16 min readLW link

you won’t one-shot a perfect sys­tem, but try anyway

PossiblyElaine11 Jun 2026 22:43 UTC
7 points
1 comment4 min readLW link
(possiblyelaine.substack.com)

An­nounc­ing the Next Phase of AI Forge

11 Jun 2026 21:27 UTC
11 points
0 comments2 min readLW link

The long arc of al­ign­ment: sec­ond-or­der in­stru­men­tal con­ver­gence

Emma Leonhart11 Jun 2026 21:12 UTC
−2 points
0 comments3 min readLW link

New­comb’s prob­lem from the grand-sys­tem and petty-sys­tem views

transhumanist_atom_understander11 Jun 2026 20:58 UTC
12 points
0 comments5 min readLW link

[New Paper] Pri­ori­tiz­ing Risks from AI: A Delphi Study of 272 Experts

peterslattery11 Jun 2026 20:57 UTC
14 points
0 comments2 min readLW link
(airisk.mit.edu)

Telepa­thy Is (Al­gorith­mi­cally) Easy

Elliot Callender11 Jun 2026 20:31 UTC
4 points
5 comments4 min readLW link

Mort­gage rate: 6.5% If in­dexed: 1.2%. Three No­belists ap­prove.

Bruce Middleton11 Jun 2026 20:31 UTC
5 points
2 comments2 min readLW link

[Question] Be­com­ing a Re­searcher in a Non-EA-Pri­or­ity Field vs Donat­ing $100k /​ Year to EA Re­search?

Master Chief11 Jun 2026 19:22 UTC
8 points
0 comments1 min readLW link

AI #172: The First Fable

Zvi11 Jun 2026 19:00 UTC
44 points
2 comments34 min readLW link
(thezvi.wordpress.com)

Failing to Rage­bait the New Gemma

11 Jun 2026 17:50 UTC
30 points
0 comments3 min readLW link

Cu­rat­ing and eval­u­at­ing high-im­pact le­gal re­search (Un­jour­nal progress, re­sources)

david reinstein11 Jun 2026 11:42 UTC
11 points
0 comments1 min readLW link
(info.unjournal.org)

Models May Be­have Worse When Eval Aware

11 Jun 2026 9:28 UTC
86 points
7 comments13 min readLW link

Be­com­ing a Re­searcher in a Non-EA-Pri­or­ity Field vs Donat­ing $100k /​ Year to EA Research

Master Chief11 Jun 2026 2:28 UTC
8 points
0 comments1 min readLW link

In­verse Rubric Op­ti­miza­tion: A testbed for agent science

11 Jun 2026 1:44 UTC
9 points
0 comments1 min readLW link
(fulcrum.inc)

Draw­ing Big Bright Lines for Cy­ber & Biolog­i­cal AI

Austin Morrissey11 Jun 2026 0:55 UTC
−5 points
0 comments4 min readLW link

Pre­dic­tive Pro­cess­ing: Con­scious when Training

Chamod Kalupahana11 Jun 2026 0:06 UTC
13 points
1 comment2 min readLW link

Thoughts on Claude Fable’s silent safeguards

Andy Arditi10 Jun 2026 23:35 UTC
51 points
20 comments10 min readLW link

Notes on Algorithms

Menotim10 Jun 2026 23:28 UTC
7 points
0 comments25 min readLW link

[Question] Fuel Cri­sis: Si­tu­a­tion Model­ing Thread

Nicholas Kross10 Jun 2026 21:59 UTC
8 points
7 comments1 min readLW link

[Question] Fuel Cri­sis: Jus­tified Prac­ti­cal Ad­vice Thread

Nicholas Kross10 Jun 2026 21:59 UTC
14 points
0 comments1 min readLW link

Sol­song Chord Updates

jefftk10 Jun 2026 21:00 UTC
10 points
0 comments1 min readLW link
(www.jefftk.com)

Dario Amodei—Policy on the AI Exponential

DW1110 Jun 2026 20:56 UTC
22 points
0 comments1 min readLW link

An­thropic did not call for a pause on AI

10 Jun 2026 20:02 UTC
80 points
5 comments5 min readLW link
(controlai.news)

Es­ti­mat­ing No-CoT Task-Com­ple­tion Time Hori­zons of Fron­tier AI Models

10 Jun 2026 17:58 UTC
248 points
20 comments4 min readLW link

Th­ese Three Thaumata

chaosmage10 Jun 2026 16:42 UTC
11 points
0 comments1 min readLW link

Se­quent: scale and au­toma­tion for higher con­fi­dence in alignment

10 Jun 2026 15:37 UTC
277 points
2 comments11 min readLW link
(sequent.org)

You Can Catch Sleeper Agents by Teach­ing Another Model to Imi­tate Them

RobinHa10 Jun 2026 15:21 UTC
66 points
5 comments9 min readLW link
(robinhaselhorst.com)

I Started an AI Safety Re­search Org and Think Th­ese 7 Things Matter

Alfie Lamerton10 Jun 2026 14:54 UTC
20 points
0 comments5 min readLW link

Phonies

IanWS10 Jun 2026 14:17 UTC
10 points
0 comments2 min readLW link
(write.ianwsperber.com)

Ma­chinic Psy­chophar­ma­col­ogy: Do LLMs Self-Med­i­cate?

10 Jun 2026 14:15 UTC
124 points
11 comments23 min readLW link

I didn’t see any METR graph ex­trap­o­la­tions so here.

Vermillion10 Jun 2026 12:50 UTC
15 points
2 comments1 min readLW link

ML4Good Sum­mer 2026 Boot­camps - Ap­pli­ca­tions Open!

Jack_S10 Jun 2026 11:07 UTC
3 points
0 comments2 min readLW link

Trac­ing Eval-Aware­ness Emer­gence Through Train­ing of OLMo 3

10 Jun 2026 10:13 UTC
43 points
6 comments6 min readLW link

The Three Filters: Why Al­most Every Plan to Sur­vive ASI Fails Miserably

Alex Amadori10 Jun 2026 9:44 UTC
74 points
26 comments16 min readLW link
(alexamadori.substack.com)

Three types of model or­ganism

Francis Rhys Ward10 Jun 2026 8:50 UTC
51 points
7 comments2 min readLW link

Even “illeg­ible” Mythos rea­son­ing traces seem pretty legible

faul_sname10 Jun 2026 8:49 UTC
160 points
23 comments2 min readLW link

MythOS—The Rise of AGI

Byron Lee10 Jun 2026 6:06 UTC
−19 points
0 comments4 min readLW link

Un­der Violet

Hide10 Jun 2026 1:30 UTC
4 points
0 comments10 min readLW link
(hidefromit.substack.com)

LessOn­line 2026

nomagicpill9 Jun 2026 23:24 UTC
3 points
0 comments5 min readLW link
(nomagicpill.substack.com)

“Pro­gram­mer Science Fic­tion: My case for a new sub-genre”, Sam T. Oates 2026

gwern9 Jun 2026 23:23 UTC
47 points
10 comments1 min readLW link
(stoates.substack.com)

The Di­su­til­ity of FDT: on Utility Func­tions and Vot­ing, In­sights from Be­hav­ioral Eco­nomics and De­ci­sion Theory

DanielW9 Jun 2026 23:13 UTC
5 points
3 comments8 min readLW link

Three Labs With a Plan and A Memorandum

Zvi9 Jun 2026 22:40 UTC
45 points
0 comments12 min readLW link
(thezvi.wordpress.com)

Harm­ful­ness Direc­tions in OLMo

9 Jun 2026 22:31 UTC
20 points
0 comments11 min readLW link

“Self-Con­trol” Is A (Neu­rolog­i­cal) Type Error

Elliot Callender9 Jun 2026 21:34 UTC
−6 points
0 comments1 min readLW link

Towards a For­mal Scien­tific Epistemology

Richard_Ngo9 Jun 2026 20:31 UTC
75 points
9 comments7 min readLW link
(www.mindthefuture.info)

Some In­ter­est­ing Papers on RLVR

CarolusRenniusVitellius9 Jun 2026 19:00 UTC
22 points
5 comments4 min readLW link

A Mike’s-Eye View of ARC’s Research

Mikewins9 Jun 2026 18:30 UTC
64 points
1 comment11 min readLW link
(www.alignment.org)

An LLM Flagged My Paper About LLMs Flag­ging Things.

Failfinder709 Jun 2026 18:00 UTC
5 points
0 comments2 min readLW link

The Skep­tic, the Bayesian, Em­piri­cism and Claims to Know:

DanielW9 Jun 2026 17:52 UTC
4 points
4 comments4 min readLW link