10 Jun 2026 15:37 UTC

276 points

(sequent.org)

Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models

Anders Cairns Woodruff, Francis Rhys Ward, Dewi Gould, Rauno Arike, Jason R Brown, Jo Jiao, wlanderson, ariana_azarbal, harrymayne, Patrick Leask, Twm Stone, Josh Hills, Ida Caspary and Shubhorup Biswas

10 Jun 2026 17:58 UTC

237 points

20 comments4 min readLW link

PSA: Almost nobody is directly working on superintelligent alignment

Chi Nguyen and peterbarnett

12 Jun 2026 5:17 UTC

230 points

41 comments1 min readLW link

Sympathy for both sides of the egregious misalignment debate

Steven Byrnes12 Jun 2026 16:26 UTC

197 points

26 comments4 min readLW link

My favorite depiction of utopia

Caleb Biddulph2 Jun 2026 23:15 UTC

189 points

20 comments33 min readLW link

(docs.google.com)

The Machines Lack Honour

Raymond Douglas9 Jun 2026 15:30 UTC

169 points

21 comments12 min readLW link

Announcing the ARC White-Box Estimation Challenge

Jacob_Hilton, paulfchristiano and Wilson Wu

2 Jun 2026 16:20 UTC

165 points

15 comments3 min readLW link

(www.alignment.org)

Even “illegible” Mythos reasoning traces seem pretty legible

faul_sname10 Jun 2026 8:49 UTC

157 points

23 comments2 min readLW link

A frontier AI company should shut down

MichaelDickens15 Jun 2026 16:56 UTC

135 points

37 comments2 min readLW link

Dissolving the Deep Learning Sample Efficiency Gap

Samuel Knoche1 Jun 2026 18:44 UTC

124 points

24 comments17 min readLW link

(theraptureofthenerds.substack.com)

Machinic Psychopharmacology: Do LLMs Self-Medicate?

Sid Black and Joseph Bloom

10 Jun 2026 14:15 UTC

124 points

11 comments23 min readLW link

Can activation verbalizers surface an internal chain of thought?

oakhu and ryan_greenblatt

7 Jun 2026 4:24 UTC

122 points

0 comments16 min readLW link

Parkinson’s Heuristic: The Only Time To Do Anything

Ben Pace12 Jun 2026 6:55 UTC

117 points

8 comments5 min readLW link

Why Software Automation Is Hard

silentbob6 Jun 2026 8:56 UTC

114 points

20 comments12 min readLW link

American Government Takes Down Claude Fable

Zvi13 Jun 2026 19:40 UTC

111 points

13 comments20 min readLW link

(thezvi.wordpress.com)

Agent Foundations Reminds Me of Continental Philosophy

IanWS2 Jun 2026 14:34 UTC

106 points

15 comments5 min readLW link

(write.ianwsperber.com)

Learnings from starting an AI safety research team

draganover and Erin Robertson

5 Jun 2026 16:27 UTC

97 points

7 comments6 min readLW link

Bun’s Migration from Zig to Rust as a Potential Case Study for Gradual Disempowerment

Sayhan Yalvaçer8 Jun 2026 7:06 UTC

96 points

8 comments3 min readLW link

One Year of PauseAI UK

Joseph Miller and PauseAI UK

5 Jun 2026 16:41 UTC

94 points

7 comments11 min readLW link

(pauseai.uk)

Gears for political races

Tom Smith17 Jun 2026 20:19 UTC

93 points

5 comments14 min readLW link

“Contagious Humming” to Silence a Room

JohnofCharleston1 Jun 2026 19:08 UTC

90 points

20 comments2 min readLW link

The Hidden Structures of Problems

spencerg14 Jun 2026 13:51 UTC

90 points

9 comments3 min readLW link

(www.spencergreenberg.com)

Guardian Angels: LLM Personalization for Productivity and Security

gwern17 Jun 2026 3:21 UTC

83 points

7 comments2 min readLW link

(gwern.net)

Does preservation make sense before we know how to revive?

Aurelia15 Jun 2026 23:40 UTC

82 points

2 comments25 min readLW link

Anthropic did not call for a pause on AI

Andrea_Miotti and Gabriel Alfour

10 Jun 2026 20:02 UTC

80 points

5 comments5 min readLW link

(controlai.news)

Models May Behave Worse When Eval Aware

Senthooran Rajamanoharan and Neel Nanda

11 Jun 2026 9:28 UTC

80 points

7 comments13 min readLW link

Why Even Experts Don’t Know What to Do About AI Risk

Luc Brinkman and plex

2 Jun 2026 17:31 UTC

78 points

22 comments2 min readLW link

Towards a Formal Scientific Epistemology

Richard_Ngo9 Jun 2026 20:31 UTC

75 points

9 comments7 min readLW link

(www.mindthefuture.info)

The Three Filters: Why Almost Every Plan to Survive ASI Fails Miserably

Alex Amadori10 Jun 2026 9:44 UTC

73 points

26 comments16 min readLW link

(alexamadori.substack.com)

China won’t win the AI race but would it be much worse if it did?

Chastity Ruth3 Jun 2026 5:46 UTC

71 points

18 comments13 min readLW link

Scaling Hypothesis #2: Are Humans Just More Over-Parameterized?

gwern17 Jun 2026 2:53 UTC

71 points

12 comments1 min readLW link

(gwern.net)

The Once And Future Fable #2

Zvi15 Jun 2026 16:00 UTC

71 points

8 comments23 min readLW link

(thezvi.wordpress.com)

SFT Drives Gemini’s Safety Properties

Josh Engels, Arthur Conmy, bilalchughtai and Neel Nanda

13 Jun 2026 15:31 UTC

69 points

3 comments1 min readLW link

The Financial Ledger Theory of Apologies

Ben Pace17 Jun 2026 6:57 UTC

68 points

6 comments4 min readLW link

Some humans are both male and female, and can (but shouldn’t) have children with themselves

HedonicEscalator1 Jun 2026 1:51 UTC

68 points

14 comments6 min readLW link

(hedonicescalator.substack.com)

US government directive to suspend access to Fable 5 and Mythos 5

Capybasilisk13 Jun 2026 1:16 UTC

67 points

15 comments1 min readLW link

(www.anthropic.com)

Against Corrigibility

peralice6 Jun 2026 20:28 UTC

66 points

17 comments12 min readLW link

You Can Catch Sleeper Agents by Teaching Another Model to Imitate Them

RobinHa10 Jun 2026 15:21 UTC

65 points

5 comments9 min readLW link

(robinhaselhorst.com)

A Mike’s-Eye View of ARC’s Research

Mikewins9 Jun 2026 18:30 UTC

64 points

1 comment11 min readLW link

(www.alignment.org)

Building Better Activation Oracles

ceselder, Jan Bauer, Niclas Luick, Adam Karvonen and Neel Nanda

4 Jun 2026 18:34 UTC

62 points

1 comment7 min readLW link

What if Anthropic unilaterally paused capabilities development right now?

Karl von Wendt6 Jun 2026 7:39 UTC

61 points

15 comments3 min readLW link

Building and evaluating model diffing agents

bilalchughtai, Josh Engels and Neel Nanda

12 Jun 2026 17:14 UTC

61 points

2 comments12 min readLW link

Beyond the lexical personality traits: What is the structure of personality?

tailcalled5 Jun 2026 19:05 UTC

60 points

1 comment5 min readLW link

How might continual learning affect safety and alignment?

Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward and Seth Herd

13 Jun 2026 17:34 UTC

59 points

2 comments16 min readLW link

Coming Around To Political Donations

jefftk6 Jun 2026 21:30 UTC

59 points

8 comments2 min readLW link

(www.jefftk.com)

How to build a cancer vaccine, and whether they will work this time

Abhishaike Mahajan8 Jun 2026 20:45 UTC

58 points

9 comments25 min readLW link

(www.owlposting.com)

Synthetic document finetuning for instilling positive traits

CallumMcDougall, Arthur Conmy and Neel Nanda

16 Jun 2026 0:04 UTC

57 points

1 comment10 min readLW link

(Mis)generalization of Helpful-Only Fine-tuning

Omar Khursheed, Baram Sosis and Fabien Roger

4 Jun 2026 18:40 UTC

55 points

7 comments11 min readLW link

Opus 4.8 Part 2: Model Welfare

Zvi1 Jun 2026 15:11 UTC

55 points

1 comment25 min readLW link

(thezvi.wordpress.com)

Several frontier models are substantially prefill aware

yeedrag, Parv Mahajan, David Africa, alexsouly, Jordan Taylor and RobertKirk

17 Jun 2026 17:41 UTC

54 points

1 comment5 min readLW link