8 points

0 comments16 min readLW link

you won’t one-shot a perfect system, but try anyway

PossiblyElaine11 Jun 2026 22:43 UTC

7 points

1 comment4 min readLW link

(possiblyelaine.substack.com)

Announcing the Next Phase of AI Forge

Mike Vaiana, johnclund and Diogo de Lucena

11 Jun 2026 21:27 UTC

11 points

0 comments2 min readLW link

The long arc of alignment: second-order instrumental convergence

Emma Leonhart11 Jun 2026 21:12 UTC

−2 points

0 comments3 min readLW link

Newcomb’s problem from the grand-system and petty-system views

transhumanist_atom_understander11 Jun 2026 20:58 UTC

12 points

0 comments5 min readLW link

[New Paper] Prioritizing Risks from AI: A Delphi Study of 272 Experts

peterslattery11 Jun 2026 20:57 UTC

14 points

0 comments2 min readLW link

(airisk.mit.edu)

Telepathy Is (Algorithmically) Easy

Elliot Callender11 Jun 2026 20:31 UTC

4 points

5 comments4 min readLW link

Mortgage rate: 6.5% If indexed: 1.2%. Three Nobelists approve.

Bruce Middleton11 Jun 2026 20:31 UTC

5 points

2 comments2 min readLW link

[Question] Becoming a Researcher in a Non-EA-Priority Field vs Donating $100k / Year to EA Research?

Master Chief11 Jun 2026 19:22 UTC

8 points

0 comments1 min readLW link

AI #172: The First Fable

Zvi11 Jun 2026 19:00 UTC

44 points

2 comments34 min readLW link

(thezvi.wordpress.com)

Failing to Ragebait the New Gemma

Neil Shah, David Africa and arav-dhoot

11 Jun 2026 17:50 UTC

30 points

0 comments3 min readLW link

Curating and evaluating high-impact legal research (Unjournal progress, resources)

david reinstein11 Jun 2026 11:42 UTC

11 points

0 comments1 min readLW link

(info.unjournal.org)

Models May Behave Worse When Eval Aware

Senthooran Rajamanoharan and Neel Nanda

11 Jun 2026 9:28 UTC

86 points

7 comments13 min readLW link

Becoming a Researcher in a Non-EA-Priority Field vs Donating $100k / Year to EA Research

Master Chief11 Jun 2026 2:28 UTC

8 points

0 comments1 min readLW link

Inverse Rubric Optimization: A testbed for agent science

zef, leni, kaivu and rohuang

11 Jun 2026 1:44 UTC

9 points

0 comments1 min readLW link

(fulcrum.inc)

Drawing Big Bright Lines for Cyber & Biological AI

Austin Morrissey11 Jun 2026 0:55 UTC

−5 points

0 comments4 min readLW link

Predictive Processing: Conscious when Training

Chamod Kalupahana11 Jun 2026 0:06 UTC

13 points

1 comment2 min readLW link

Thoughts on Claude Fable’s silent safeguards

Andy Arditi10 Jun 2026 23:35 UTC

51 points

20 comments10 min readLW link

Notes on Algorithms

Menotim10 Jun 2026 23:28 UTC

7 points

0 comments25 min readLW link

[Question] Fuel Crisis: Situation Modeling Thread

Nicholas Kross10 Jun 2026 21:59 UTC

8 points

7 comments1 min readLW link

[Question] Fuel Crisis: Justified Practical Advice Thread

Nicholas Kross10 Jun 2026 21:59 UTC

14 points

0 comments1 min readLW link

Solsong Chord Updates

jefftk10 Jun 2026 21:00 UTC

10 points

0 comments1 min readLW link

(www.jefftk.com)

Dario Amodei—Policy on the AI Exponential

DW1110 Jun 2026 20:56 UTC

22 points

0 comments1 min readLW link

Anthropic did not call for a pause on AI

Andrea_Miotti and Gabriel Alfour

10 Jun 2026 20:02 UTC

80 points

5 comments5 min readLW link

(controlai.news)

Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

Anders Cairns Woodruff, Francis Rhys Ward, Dewi Gould, Rauno Arike, Jason R Brown, Jo Jiao, wlanderson, ariana_azarbal, harrymayne, Patrick Leask, Twm Stone, Josh Hills, Ida Caspary and Shubhorup Biswas

10 Jun 2026 17:58 UTC

248 points

20 comments4 min readLW link

These Three Thaumata

chaosmage10 Jun 2026 16:42 UTC

11 points

0 comments1 min readLW link

Sequent: scale and automation for higher confidence in alignment

Geoffrey Irving, Alex HT, Jesse Hoogland, Daniel Murfet, Jacob Pfau, Marco Cozzi and Stan van Wingerden

10 Jun 2026 15:37 UTC

277 points

2 comments11 min readLW link

(sequent.org)

You Can Catch Sleeper Agents by Teaching Another Model to Imitate Them

RobinHa10 Jun 2026 15:21 UTC

66 points

5 comments9 min readLW link

(robinhaselhorst.com)

I Started an AI Safety Research Org and Think These 7 Things Matter

Alfie Lamerton10 Jun 2026 14:54 UTC

20 points

0 comments5 min readLW link

Phonies

IanWS10 Jun 2026 14:17 UTC

10 points

0 comments2 min readLW link

(write.ianwsperber.com)

Machinic Psychopharmacology: Do LLMs Self-Medicate?

Sid Black and Joseph Bloom

10 Jun 2026 14:15 UTC

124 points

11 comments23 min readLW link

I didn’t see any METR graph extrapolations so here.

Vermillion10 Jun 2026 12:50 UTC

15 points

2 comments1 min readLW link

ML4Good Summer 2026 Bootcamps - Applications Open!

Jack_S10 Jun 2026 11:07 UTC

3 points

0 comments2 min readLW link

Tracing Eval-Awareness Emergence Through Training of OLMo 3

Ram Bharadwaj and RobertKirk

10 Jun 2026 10:13 UTC

43 points

6 comments6 min readLW link

The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably

Alex Amadori10 Jun 2026 9:44 UTC

74 points

26 comments16 min readLW link

(alexamadori.substack.com)

Three types of model organism

Francis Rhys Ward10 Jun 2026 8:50 UTC

51 points

7 comments2 min readLW link

Even “illegible” Mythos reasoning traces seem pretty legible

faul_sname10 Jun 2026 8:49 UTC

160 points

23 comments2 min readLW link

MythOS—The Rise of AGI

Byron Lee10 Jun 2026 6:06 UTC

−19 points

0 comments4 min readLW link

Under Violet

Hide10 Jun 2026 1:30 UTC

4 points

0 comments10 min readLW link

(hidefromit.substack.com)

LessOnline 2026

nomagicpill9 Jun 2026 23:24 UTC

3 points

0 comments5 min readLW link

(nomagicpill.substack.com)

“Programmer Science Fiction: My case for a new sub-genre”, Sam T. Oates 2026

gwern9 Jun 2026 23:23 UTC

47 points

10 comments1 min readLW link

(stoates.substack.com)

The Disutility of FDT: on Utility Functions and Voting, Insights from Behavioral Economics and Decision Theory

DanielW9 Jun 2026 23:13 UTC

5 points

3 comments8 min readLW link

Three Labs With a Plan and A Memorandum

Zvi9 Jun 2026 22:40 UTC

45 points

0 comments12 min readLW link

(thezvi.wordpress.com)

Harmfulness Directions in OLMo

Daniele Pace, Bryan Maruyama and LorenzoPacchiardi

9 Jun 2026 22:31 UTC

20 points

0 comments11 min readLW link

“Self-Control” Is A (Neurological) Type Error

Elliot Callender9 Jun 2026 21:34 UTC

−6 points

0 comments1 min readLW link

Towards a Formal Scientific Epistemology

Richard_Ngo9 Jun 2026 20:31 UTC

75 points

9 comments7 min readLW link

(www.mindthefuture.info)

Some Interesting Papers on RLVR

CarolusRenniusVitellius9 Jun 2026 19:00 UTC

22 points

5 comments4 min readLW link

A Mike’s-Eye View of ARC’s Research

Mikewins9 Jun 2026 18:30 UTC

64 points

1 comment11 min readLW link

(www.alignment.org)

An LLM Flagged My Paper About LLMs Flagging Things.

Failfinder709 Jun 2026 18:00 UTC

5 points

0 comments2 min readLW link

The Skeptic, the Bayesian, Empiricism and Claims to Know:

DanielW9 Jun 2026 17:52 UTC

4 points

4 comments4 min readLW link