All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar Apr MayJun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2122

Coup is the Pareto-optimal social game

Daniel Tan21 Jun 2026 23:31 UTC

22 points

7 comments2 min readLW link

Introducing MonitoringBench

monika_j21 Jun 2026 18:43 UTC

41 points

0 comments6 min readLW link

How persona training could fail

Simon Lermen21 Jun 2026 16:38 UTC

13 points

0 comments4 min readLW link

A high-level model of AI bargaining

Anthony DiGiovanni21 Jun 2026 15:37 UTC

17 points

1 comment5 min readLW link

Policy changes should be rolled out gradually

Yair Halberstadt21 Jun 2026 11:07 UTC

28 points

2 comments3 min readLW link

A misalignment taxonomy

Alec Harris21 Jun 2026 10:20 UTC

13 points

2 comments3 min readLW link

The Cookie Monster Explains AI Safety

michaelwaves21 Jun 2026 0:52 UTC

12 points

2 comments2 min readLW link

Google Can’t Math Parsecs

jefftk21 Jun 2026 0:30 UTC

96 points

0 comments1 min readLW link

(www.jefftk.com)

How are there 0 studies (maybe 1) on sex-concordant hormone therapy?

Util20 Jun 2026 22:36 UTC

14 points

0 comments3 min readLW link

Against Planet-Eating Nanoreplicators

SurvivalBias20 Jun 2026 20:27 UTC

10 points

7 comments5 min readLW link

How transparent is DiffusionGemma (and why it matters)

Josh Engels, Callum McDougall, bilalchughtai, János Kramár, Senthooran Rajamanoharan, Arthur Conmy, Rohin Shah and Neel Nanda

20 Jun 2026 20:05 UTC

72 points

2 comments4 min readLW link

Animal Futures Forecasting Tournament

david reinstein20 Jun 2026 19:39 UTC

14 points

2 comments1 min readLW link

The Invisible Side of AI Governance

Charbel-Raphaël20 Jun 2026 18:54 UTC

100 points

4 comments14 min readLW link

Would anybody here be interested in a “mistake postmortem” discussion group?

SK220 Jun 2026 12:03 UTC

50 points

7 comments4 min readLW link

Unchickenous Apricot Berry Cake

jefftk20 Jun 2026 2:20 UTC

22 points

1 comment1 min readLW link

(www.jefftk.com)

The LLM shoggoth meme is weirder than you think

HedonicEscalator19 Jun 2026 23:35 UTC

126 points

8 comments7 min readLW link

(hedonicescalator.substack.com)

How I think developers of frontier AI systems and regulators ought to act in the face of existential AI risk

WilliamKiely19 Jun 2026 22:22 UTC

12 points

0 comments12 min readLW link

Hyperstition as the Natural Enemy of Rationality

alseph19 Jun 2026 21:12 UTC

29 points

8 comments3 min readLW link

World-modeling the US vs. Anthropic Standoff on Claude Fable

dschwarz19 Jun 2026 20:04 UTC

20 points

4 comments8 min readLW link

Thoughts on Likelihood of Existential Risks by Misaligned AIs

Ishan Khire19 Jun 2026 19:17 UTC

3 points

0 comments6 min readLW link

(ishankhire.substack.com)

Why should AI be moral?

Zach Thornton19 Jun 2026 19:13 UTC

12 points

3 comments9 min readLW link

AI Safety Ecosystem Research notes

Eneasz19 Jun 2026 18:21 UTC

31 points

1 comment8 min readLW link

A brief list of ways AI safety efforts could be net negative

Elias Schmied19 Jun 2026 16:12 UTC

28 points

4 comments2 min readLW link

Online >> real life for spreading ideas

Bill Jackson19 Jun 2026 15:44 UTC

12 points

1 comment2 min readLW link

Typical Minds Aren’t

Gordon Seidoh Worley19 Jun 2026 15:11 UTC

5 points

6 comments2 min readLW link

(www.uncertainupdates.com)

San Silvestro

Tomás B.19 Jun 2026 14:54 UTC

39 points

1 comment14 min readLW link

(open.substack.com)

Claude Fable 5 and Mythos 5: Capabilities

Zvi19 Jun 2026 14:40 UTC

30 points

2 comments38 min readLW link

(thezvi.wordpress.com)

The one-week sprint

Daniel Tan19 Jun 2026 12:46 UTC

41 points

4 comments2 min readLW link

Futarchy is insecure without a trusted gatekeeper

distbit19 Jun 2026 12:22 UTC

2 points

0 comments10 min readLW link

Patching ~All Security-Relevant Open-Source Software? [niplav 2025]

Quinn19 Jun 2026 12:13 UTC

15 points

1 comment1 min readLW link

(forum.effectivealtruism.org)

Cosmological Odyssey

breaker2519 Jun 2026 5:06 UTC

−12 points

1 comment3 min readLW link

Research agenda: Interpretive debate

Shi18 Jun 2026 23:46 UTC

34 points

0 comments7 min readLW link

Does it feel any different to be reverse-chiral life?

jessicata18 Jun 2026 22:56 UTC

10 points

0 comments10 min readLW link

Reinforcement learning towards broadly and persistently beneficial models

papetoast18 Jun 2026 22:11 UTC

19 points

0 comments1 min readLW link

(alignment.openai.com)

The distillation double bind: Distilling misaligned models either transfers misalignment or it doesn’t

Alek Westover, SebastianP, Alexa Pan and Jozdien

18 Jun 2026 21:21 UTC

57 points

4 comments5 min readLW link

(blog.redwoodresearch.org)

CoT-forcing promptware

Bruce Middleton18 Jun 2026 19:33 UTC

2 points

0 comments2 min readLW link

AI that represents you can’t be neutral.

agulaya2418 Jun 2026 18:50 UTC

−1 points

2 comments3 min readLW link

On “Model Organisms”

J Bostock18 Jun 2026 18:42 UTC

33 points

1 comment6 min readLW link

Introduction: Gaussian Natural Latents

Haru18 Jun 2026 18:41 UTC

41 points

2 comments3 min readLW link

GDM AI Control Roadmap

Mary Phuong, Erik Jenner, Rohin Shah and Seb Farquhar

18 Jun 2026 16:50 UTC

82 points

2 comments1 min readLW link

Contra Pace on When to Apologize

Zack_M_Davis18 Jun 2026 16:49 UTC

57 points

27 comments6 min readLW link

(zackmdavis.net)

Your Model Organisms Might Be Fried

Daniel Tan, J Bostock, draganover, ma-rmartinez, sidbaines and David Africa

18 Jun 2026 16:18 UTC

92 points

6 comments7 min readLW link

Shard narcissism as delusion of unembededness

Fernand018 Jun 2026 14:29 UTC

10 points

1 comment4 min readLW link

AI #173: AI Pauses

Zvi18 Jun 2026 13:40 UTC

35 points

2 comments47 min readLW link

(thezvi.wordpress.com)

War of Dots: CRUSHING my opponents with FACTS and LOGIC

momom218 Jun 2026 12:07 UTC

17 points

2 comments7 min readLW link

How far do open weights trail the frontier?

RobinHa18 Jun 2026 11:01 UTC

22 points

4 comments1 min readLW link

(robinhaselhorst.com)

Karlsruhe—LW/ACX Meetup—June 2026

volis18 Jun 2026 9:55 UTC

1 point

0 comments1 min readLW link

GLM 5.2 playing text adventures

kqr18 Jun 2026 7:23 UTC

14 points

1 comment1 min readLW link

(entropicthoughts.com)

Leveraged on being right

Ben Pace, the Vacationing Vagabond18 Jun 2026 6:51 UTC

82 points

7 comments3 min readLW link

Vulnerabilities and exploits: where are we headed?

tchauvin18 Jun 2026 5:49 UTC

9 points

0 comments5 min readLW link

(tchauvin.com)