All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202122 23 24 25 26 27 28 29 30 31

Unbounded Embedded Agency: AEDT w.r.t. rOSI

Cole Wyeth20 Jul 2025 23:46 UTC

36 points

0 comments16 min readLW link

AI-Oriented Investments

PeterMcCluskey20 Jul 2025 21:31 UTC

30 points

0 comments1 min readLW link

(bayesianinvestor.com)

On The Shoulders of Substrates—how one phenomenon lays the foundation for the next

James Stephen Brown20 Jul 2025 21:11 UTC

14 points

1 comment3 min readLW link

(nonzerosum.games)

Life of Posts?

jmh20 Jul 2025 21:04 UTC

10 points

3 comments1 min readLW link

LLMs Can’t See Pixels or Characters

Brendan Long20 Jul 2025 20:00 UTC

100 points

44 comments4 min readLW link

(www.brendanlong.com)

Do “adult developmental stages” theories have any pre-theoretic motivation?

Said Achmiz20 Jul 2025 14:37 UTC

35 points

19 comments3 min readLW link

Parallel Parking and possibly Instrumental Convergence

CstineSublime20 Jul 2025 10:37 UTC

2 points

10 comments3 min readLW link

Plato’s Trolley

dr_s20 Jul 2025 10:07 UTC

37 points

11 comments7 min readLW link

Shallow Water is Dangerous Too

jefftk20 Jul 2025 2:30 UTC

236 points

24 comments2 min readLW link

(www.jefftk.com)

Your AI Safety org could get EU funding up to €9.08M. Here’s how (+ free personalized support) Update: Webinar 18/8 Link Below

SamuelK20 Jul 2025 1:30 UTC

68 points

4 comments3 min readLW link

Make More Grayspaces

Duncan Sabien (Inactive)19 Jul 2025 22:22 UTC

314 points

65 comments13 min readLW link

Cheating at Bets with the Even Odds Algorithm

omark19 Jul 2025 22:06 UTC

12 points

3 comments6 min readLW link

Can We Trust the Judge? A novel method of Modelling Human Bias and Systematic Error in Debate-Based Scalable Oversight

Andreea Zaman19 Jul 2025 21:44 UTC

1 point

0 comments7 min readLW link

Peeling Back The Remoteness of Sources

adamShimi19 Jul 2025 17:41 UTC

16 points

1 comment13 min readLW link

(formethods.substack.com)

Sequential Coherence: A Bottleneck in Automation

eeeee, xavi_ferres and felixgaston

19 Jul 2025 15:27 UTC

26 points

2 comments11 min readLW link

How Misaligned AI Personas Lead to Human Extinction – Step by Step

Writer19 Jul 2025 13:59 UTC

14 points

0 comments7 min readLW link

(youtu.be)

L0 is not a neutral hyperparameter

chanind and Adrià Garriga-alonso

19 Jul 2025 13:51 UTC

24 points

3 comments5 min readLW link

From Messy Shelves to Master Librarians: Toy-Model Exploration of Block-Diagonal Geometry in LM Activations

Yuxiao19 Jul 2025 12:26 UTC

6 points

1 comment4 min readLW link

OpenAI Claims IMO Gold Medal

Mikhail Samin19 Jul 2025 9:58 UTC

77 points

74 comments1 min readLW link

(x.com)

On the deep (uncurable?) vulnerability of MCPs

awu19 Jul 2025 2:50 UTC

5 points

6 comments1 min readLW link

(www.generalanalysis.com)

[Question] Best way to ask laypeople for conditional probabilities in a Bayes net?

Zack Friedman19 Jul 2025 2:45 UTC

11 points

1 comment1 min readLW link

[Question] Get sued or kill someone: The trolly problems of Psychological practice.

Brad Dunn18 Jul 2025 23:35 UTC

12 points

2 comments3 min readLW link

resume limiting

bhauth18 Jul 2025 23:31 UTC

18 points

13 comments2 min readLW link

(www.bhauth.com)

[Linkpost] How Am I Getting Along with AI?

Gunnar_Zarncke18 Jul 2025 22:26 UTC

11 points

0 comments1 min readLW link

(jessiefischbein.substack.com)

Agents lag behind AI 2027′s schedule

OhadA18 Jul 2025 21:49 UTC

25 points

7 comments4 min readLW link

Emergent Gravity—order out of chaos

James Stephen Brown18 Jul 2025 19:26 UTC

3 points

6 comments5 min readLW link

(nonzerosum.games)

Love stays loved (formerly “Skin”)

Swimmer963 (Miranda Dixon-Luinenburg) 18 Jul 2025 19:17 UTC

282 points

12 comments29 min readLW link

Why Alignment Fails Without a Functional Model of Intelligence

CC4CI18 Jul 2025 18:02 UTC

7 points

4 comments1 min readLW link

The Rising Premium of Life, Part 2

Linch18 Jul 2025 17:42 UTC

19 points

0 comments20 min readLW link

(linch.substack.com)

The Story of the World’s First AI-Organized Event

Shoshannah Tekofsky18 Jul 2025 17:41 UTC

31 points

4 comments8 min readLW link

(theaidigest.org)

Why it’s hard to make settings for high-stakes control research

Buck18 Jul 2025 16:33 UTC

49 points

6 comments4 min readLW link

Making of IAN v2

Jan18 Jul 2025 16:13 UTC

17 points

0 comments8 min readLW link

(universalprior.substack.com)

On METR’s AI Coding RCT

Zvi18 Jul 2025 12:40 UTC

52 points

6 comments10 min readLW link

(thezvi.wordpress.com)

Should you steelman what you don’t understand?

CstineSublime18 Jul 2025 10:26 UTC

6 points

5 comments6 min readLW link

“Some Basic Level of Mutual Respect About Whether Other People Deserve to Live”?!

Zack_M_Davis18 Jul 2025 6:41 UTC

26 points

84 comments4 min readLW link

There’s no way to stop models knowing they’ve been rolled back

Adam Mcmurchie18 Jul 2025 3:14 UTC

5 points

3 comments2 min readLW link

I Have Found You Once Again, My Cult (But In A Good Way)

Victor At Gizli18 Jul 2025 3:13 UTC

8 points

2 comments3 min readLW link

Notes on spaced repetition scheduling

nwm18 Jul 2025 2:32 UTC

29 points

7 comments7 min readLW link

Why do Mechanistic Interpretability?

Prudhviraj Naidu17 Jul 2025 23:21 UTC

2 points

0 comments5 min readLW link

Ketamine Part 1: Dosing

Elizabeth17 Jul 2025 20:10 UTC

25 points

0 comments7 min readLW link

(acesounderglass.com)

Aurelius: A Peer-to-Peer Alignment Protocol

Austin McCaffrey17 Jul 2025 19:13 UTC

3 points

4 comments1 min readLW link

(github.com)

Self-Control is now an Engineering Problem

Josh Mitchell17 Jul 2025 18:13 UTC

−6 points

4 comments5 min readLW link

Video and transcript of talk on “Can goodness compete?”

Joe Carlsmith17 Jul 2025 17:54 UTC

98 points

19 comments34 min readLW link

(joecarlsmith.substack.com)

Are agent-action-dependent beliefs underdetermined by external reality?

Said Achmiz17 Jul 2025 14:33 UTC

21 points

16 comments6 min readLW link

AI #125: Smooth Criminal

Zvi17 Jul 2025 14:30 UTC

33 points

0 comments56 min readLW link

(thezvi.wordpress.com)

AI Offense Defense Balance in a Multipolar World

otto.barten and Sammy Martin

17 Jul 2025 9:34 UTC

15 points

5 comments18 min readLW link

(www.existentialriskobservatory.org)

Biweekly AI Safety Comms Meetup

Vishakha17 Jul 2025 7:50 UTC

5 points

0 comments1 min readLW link

Do you care about your clone?

Harry Partridge17 Jul 2025 6:06 UTC

8 points

7 comments2 min readLW link

Comment on “Four Layers of Intellectual Conversation”

Zack_M_Davis17 Jul 2025 3:53 UTC

66 points

11 comments5 min readLW link

Towards plausible moral naturalism

jessicata17 Jul 2025 1:51 UTC

18 points

9 comments9 min readLW link

(unstableontology.com)