All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 20252026

All Jan Feb Mar Apr MayJun

All 1 2 3 4 5 6 7 8 9 10 11 12 131415 16 17 18

Anthropic Is Taking AI Welfare Seriously. I’m Not Sure It Knows What It’s Measuring.

Failfinder7013 Jun 2026 20:54 UTC

−1 points

4 comments3 min readLW link

A cheap specialist judge gets used by agents but fails to reduce alignment audit costs

burnssa13 Jun 2026 20:38 UTC

8 points

0 comments8 min readLW link

What is a game?

Isaac Newton13 Jun 2026 19:51 UTC

2 points

2 comments8 min readLW link

(archimedeanmonoid.substack.com)

American Government Takes Down Claude Fable

Zvi13 Jun 2026 19:40 UTC

111 points

13 comments20 min readLW link

(thezvi.wordpress.com)

Not telling is lying

Fernand013 Jun 2026 18:12 UTC

10 points

16 comments3 min readLW link

A simple argument for trying less hard

Elias Schmied13 Jun 2026 18:12 UTC

13 points

3 comments3 min readLW link

How might continual learning affect safety and alignment?

Rauno Arike, RohanS, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward and Seth Herd

13 Jun 2026 17:34 UTC

59 points

2 comments16 min readLW link

Presentfulness: Lucidity, Osmosis, and Dissociation

Astrid Callender13 Jun 2026 17:21 UTC

4 points

2 comments5 min readLW link

How to Suffer Less

Gordon Seidoh Worley13 Jun 2026 17:10 UTC

19 points

4 comments6 min readLW link

(www.uncertainupdates.com)

Somewhat Contra Ted Chiang on AI Consciousness

ThomasJ13 Jun 2026 16:49 UTC

8 points

0 comments10 min readLW link

The term “AGI” is almost useless at this point [Linkpost]

Noosphere8913 Jun 2026 16:15 UTC

30 points

1 comment5 min readLW link

(helentoner.substack.com)

SFT Drives Gemini’s Safety Properties

Josh Engels, Arthur Conmy, bilalchughtai and Neel Nanda

13 Jun 2026 15:31 UTC

69 points

3 comments1 min readLW link

Why not take the AI fight to the ground?

less_raichu13 Jun 2026 15:04 UTC

8 points

5 comments1 min readLW link

AML for AI as a verification mechanism

MarkelKori13 Jun 2026 11:59 UTC

9 points

2 comments2 min readLW link

Pulling hedonic utilitarianism out of ethical emotivism

Bill Jackson13 Jun 2026 11:50 UTC

6 points

2 comments6 min readLW link

(billjackson7.substack.com)

Tequila Sunset at the Hog’s Head (A Scene)

Ben Pace13 Jun 2026 6:53 UTC

22 points

1 comment5 min readLW link

US government directive to suspend access to Fable 5 and Mythos 5

Capybasilisk13 Jun 2026 1:16 UTC

67 points

15 comments1 min readLW link

(www.anthropic.com)

Do we learn less from our decisions than we think we do?

QuietCalibration13 Jun 2026 1:05 UTC

5 points

0 comments1 min readLW link

Exploration of a DNA Sequencing Basecaller using Activation Patching

Madeleine L13 Jun 2026 0:58 UTC

3 points

0 comments6 min readLW link

Sandy Blvd as an example of complexity

Adam Zerner13 Jun 2026 0:28 UTC

20 points

0 comments2 min readLW link

Short Timelines Favor Control, Long Timelines Favor Infrastructure Security

Jannis13 Jun 2026 0:12 UTC

7 points

0 comments3 min readLW link

Cat allergies & Cavities

Etha13 Jun 2026 0:11 UTC

6 points

1 comment2 min readLW link

When Emotion Descriptors Fail: AI-Native Functions of Emotion Vectors

CandidLind12 Jun 2026 23:20 UTC

8 points

0 comments27 min readLW link

A Generated Web

Klemen12 Jun 2026 23:09 UTC

3 points

0 comments3 min readLW link

The Quest To Find The Next Big Communicators In AI Safety

Akshyae Singh12 Jun 2026 20:17 UTC

17 points

3 comments6 min readLW link

Updates on performative misalignment

David Vella Zarb, Rustem, Taywon Min and Shi

12 Jun 2026 20:15 UTC

16 points

0 comments12 min readLW link

Statistical Physics for Ambitious Interpretability: A Workshop Retrospective

Lauren Greenspan, Lucas Teixeira and ClaudineLim

12 Jun 2026 20:01 UTC

4 points

0 comments6 min readLW link

Calibrating Activation Vectors using Norm

Kamesh R12 Jun 2026 19:59 UTC

1 point

0 comments3 min readLW link

Claude Fable 5 and Mythos 5: The System Card

Zvi12 Jun 2026 18:50 UTC

48 points

1 comment29 min readLW link

(thezvi.wordpress.com)

What’s Continual Learning, and Why Might We Expect To See It In Advanced LLM Agents?

RohanS, Rauno Arike, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward and Seth Herd

12 Jun 2026 18:43 UTC

28 points

2 comments17 min readLW link

Implications of Continual Learning for LLM Agents: Introduction

RohanS, Rauno Arike, Owen Terry, Achu Menon, Zhijing Jin, Francis Rhys Ward and Seth Herd

12 Jun 2026 18:36 UTC

46 points

0 comments6 min readLW link

Surplus: for massive public good

Austin Chen12 Jun 2026 18:10 UTC

11 points

0 comments4 min readLW link

(surplus.dev)

Reward Hacking at the 1937 World’s Fair

frmsaul12 Jun 2026 17:47 UTC

36 points

14 comments3 min readLW link

Bunk in AF

Fernand012 Jun 2026 17:41 UTC

6 points

0 comments1 min readLW link

Building and evaluating model diffing agents

bilalchughtai, Josh Engels and Neel Nanda

12 Jun 2026 17:14 UTC

61 points

2 comments12 min readLW link

Rational Animations is a 501(c)(3) nonprofit and is looking for board members

Writer12 Jun 2026 16:47 UTC

7 points

0 comments2 min readLW link

“AF needs empirical grounding” is a meaningless valley of compromise

Fernand012 Jun 2026 16:37 UTC

9 points

3 comments1 min readLW link

How bad would it be if GPS satellites were shot down?

Jackson Wagner12 Jun 2026 16:34 UTC

19 points

0 comments21 min readLW link

Sympathy for both sides of the egregious misalignment debate

Steven Byrnes12 Jun 2026 16:26 UTC

197 points

26 comments4 min readLW link

The Uncertainty That Matters Isn’t Fundamental

jimmy12 Jun 2026 16:23 UTC

30 points

1 comment13 min readLW link

Citations Needed: Magic Encyclopedias to Save the World

Oliver Sourbut12 Jun 2026 15:35 UTC

40 points

3 comments5 min readLW link

(www.oliversourbut.net)

If you, a human, can imagine red and green being swapped, you are probably conscious

vals tutor12 Jun 2026 13:28 UTC

4 points

19 comments7 min readLW link

Simulating Simulators

kromem12 Jun 2026 12:56 UTC

43 points

2 comments15 min readLW link

Learning to spend money

Yair Halberstadt12 Jun 2026 6:56 UTC

19 points

1 comment2 min readLW link

Parkinson’s Heuristic: The Only Time To Do Anything

Ben Pace12 Jun 2026 6:55 UTC

117 points

8 comments5 min readLW link

PSA: Almost nobody is directly working on superintelligent alignment

Chi Nguyen and peterbarnett

12 Jun 2026 5:17 UTC

230 points

41 comments1 min readLW link

Honey is Good

G Wood12 Jun 2026 4:07 UTC

9 points

4 comments3 min readLW link

The Aestheticising Vice by Paul Seabright

Linch12 Jun 2026 2:20 UTC

25 points

2 comments2 min readLW link

Celene’s thoughts on consciousness

ToasterLightning12 Jun 2026 0:55 UTC

46 points

34 comments18 min readLW link

(terminuspoint.substack.com)

Construct validity of Claude Opus 4.8′s System Card – A commentary

Maria Federica Martino Lena 11 Jun 2026 23:33 UTC

8 points

0 comments16 min readLW link