All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar Apr MayJun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 192021

The LLM shoggoth meme is weirder than you think

HedonicEscalator19 Jun 2026 23:35 UTC

119 points

8 comments7 min readLW link

(hedonicescalator.substack.com)

How I think developers of frontier AI systems and regulators ought to act in the face of existential AI risk

WilliamKiely19 Jun 2026 22:22 UTC

12 points

0 comments12 min readLW link

Hyperstition as the Natural Enemy of Rationality

alseph19 Jun 2026 21:12 UTC

32 points

7 comments3 min readLW link

World-modeling the US vs. Anthropic Standoff on Claude Fable

dschwarz19 Jun 2026 20:04 UTC

18 points

3 comments8 min readLW link

Thoughts on Likelihood of Existential Risks by Misaligned AIs

Ishan Khire19 Jun 2026 19:17 UTC

3 points

0 comments6 min readLW link

(ishankhire.substack.com)

Why should AI be moral?

Zach Thornton19 Jun 2026 19:13 UTC

12 points

2 comments9 min readLW link

AI Safety Ecosystem Research notes

Eneasz19 Jun 2026 18:21 UTC

31 points

1 comment8 min readLW link

A brief list of ways AI safety efforts could be net negative

Elias Schmied19 Jun 2026 16:12 UTC

28 points

4 comments2 min readLW link

Online >> real life for spreading ideas

Bill Jackson19 Jun 2026 15:44 UTC

12 points

1 comment2 min readLW link

Typical Minds Aren’t

Gordon Seidoh Worley19 Jun 2026 15:11 UTC

5 points

6 comments2 min readLW link

(www.uncertainupdates.com)

San Silvestro

Tomás B.19 Jun 2026 14:54 UTC

39 points

1 comment14 min readLW link

(open.substack.com)

Claude Fable 5 and Mythos 5: Capabilities

Zvi19 Jun 2026 14:40 UTC

30 points

2 comments38 min readLW link

(thezvi.wordpress.com)

The one-week sprint

Daniel Tan19 Jun 2026 12:46 UTC

39 points

1 comment2 min readLW link

Futarchy is insecure without a trusted gatekeeper

distbit19 Jun 2026 12:22 UTC

2 points

0 comments10 min readLW link

Patching ~All Security-Relevant Open-Source Software? [niplav 2025]

Quinn19 Jun 2026 12:13 UTC

15 points

1 comment1 min readLW link

(forum.effectivealtruism.org)

Cosmological Odyssey

breaker2519 Jun 2026 5:06 UTC

−12 points

1 comment3 min readLW link

Research agenda: Interpretive debate

Shi18 Jun 2026 23:46 UTC

30 points

0 comments7 min readLW link

Does it feel any different to be reverse-chiral life?

jessicata18 Jun 2026 22:56 UTC

10 points

0 comments10 min readLW link

Reinforcement learning towards broadly and persistently beneficial models

papetoast18 Jun 2026 22:11 UTC

19 points

0 comments1 min readLW link

(alignment.openai.com)

The distillation double bind: Distilling misaligned models either transfers misalignment or it doesn’t

Alek Westover, SebastianP, Alexa Pan and Jozdien

18 Jun 2026 21:21 UTC

57 points

4 comments5 min readLW link

(blog.redwoodresearch.org)

CoT-forcing promptware

Bruce Middleton18 Jun 2026 19:33 UTC

2 points

0 comments2 min readLW link

AI that represents you can’t be neutral.

agulaya2418 Jun 2026 18:50 UTC

−1 points

2 comments3 min readLW link

On “Model Organisms”

J Bostock18 Jun 2026 18:42 UTC

31 points

1 comment6 min readLW link

Introduction: Gaussian Natural Latents

Haru18 Jun 2026 18:41 UTC

41 points

2 comments3 min readLW link

GDM AI Control Roadmap

Mary Phuong, Erik Jenner, Rohin Shah and Seb Farquhar

18 Jun 2026 16:50 UTC

81 points

2 comments1 min readLW link

Contra Pace on When to Apologize

Zack_M_Davis18 Jun 2026 16:49 UTC

54 points

21 comments6 min readLW link

(zackmdavis.net)

Your Model Organisms Might Be Fried

Daniel Tan, J Bostock, draganover, ma-rmartinez, sidbaines and David Africa

18 Jun 2026 16:18 UTC

84 points

6 comments7 min readLW link

Shard narcissism as delusion of unembededness

Fernand018 Jun 2026 14:29 UTC

10 points

1 comment4 min readLW link

AI #173: AI Pauses

Zvi18 Jun 2026 13:40 UTC

35 points

2 comments47 min readLW link

(thezvi.wordpress.com)

War of Dots: CRUSHING my opponents with FACTS and LOGIC

momom218 Jun 2026 12:07 UTC

17 points

2 comments7 min readLW link

How far do open weights trail the frontier?

RobinHa18 Jun 2026 11:01 UTC

22 points

4 comments1 min readLW link

(robinhaselhorst.com)

Karlsruhe—LW/ACX Meetup—June 2026

volis18 Jun 2026 9:55 UTC

1 point

0 comments1 min readLW link

GLM 5.2 playing text adventures

kqr18 Jun 2026 7:23 UTC

14 points

1 comment1 min readLW link

(entropicthoughts.com)

Leveraged on being right

Ben Pace, the Vacationing Vagabond18 Jun 2026 6:51 UTC

74 points

7 comments3 min readLW link

Vulnerabilities and exploits: where are we headed?

tchauvin18 Jun 2026 5:49 UTC

9 points

0 comments5 min readLW link

(tchauvin.com)

Agents are under-elicited: A case study in optimization tasks

zef, kaivu, leni and rohuang

18 Jun 2026 2:39 UTC

17 points

1 comment7 min readLW link

(fulcrum.inc)

A preliminary experiment regarding consistency as a measure of conceptual abilities in language models

Chi Nguyen17 Jun 2026 22:56 UTC

20 points

3 comments7 min readLW link

(casparoesterheld.com)

Kraków Aligned

Tobiasz B and saintgull

17 Jun 2026 20:21 UTC

1 point

0 comments1 min readLW link

Gears for political races

Tom Smith17 Jun 2026 20:19 UTC

163 points

19 comments14 min readLW link

“Did you lie?” Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms

Alan Cooney, David Africa and Geoffrey Irving

17 Jun 2026 18:43 UTC

30 points

0 comments6 min readLW link

(arxiv.org)

Porting MACHIAVELLI To Inspect

Koby Lewis17 Jun 2026 17:58 UTC

7 points

0 comments4 min readLW link

(kobylewis.net)

Several frontier models are substantially prefill aware

yeedrag, Parv Mahajan, David Africa, alexsouly, Jordan Taylor and RobertKirk

17 Jun 2026 17:41 UTC

59 points

2 comments5 min readLW link

Lock-In Risk Needs More Researchers. Here’s Where to Start

Alfie Lamerton17 Jun 2026 17:33 UTC

12 points

2 comments13 min readLW link

A Geometric Account of Activation Steering through Angle–Norm Decomposition

Atmyre and Georgii Aparin

17 Jun 2026 15:23 UTC

9 points

0 comments5 min readLW link

(atmyre.github.io)

The Once And Future Fable #3: Fix This Code

Zvi17 Jun 2026 14:10 UTC

62 points

9 comments21 min readLW link

(thezvi.wordpress.com)

Alignment pretraining could backfire

Alexandre Variengien17 Jun 2026 13:52 UTC

43 points

8 comments1 min readLW link

Toward a Kantian refutation of Agent Foundations

Fernand017 Jun 2026 13:30 UTC

9 points

0 comments8 min readLW link

Illusionists should try to build hedonium

Jack Thompson17 Jun 2026 12:25 UTC

−3 points

6 comments9 min readLW link

(jacktlab.substack.com)

Omission Attacks Project Proposal

Chris Harig17 Jun 2026 7:08 UTC

2 points

0 comments3 min readLW link

The Financial Ledger Theory of Apologies

Ben Pace, the Vacationing Vagabond17 Jun 2026 6:57 UTC

46 points

9 comments4 min readLW link