All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 192021 22 23 24 25 26 27 28 29 30 31

Make More Grayspaces

Duncan Sabien (Inactive)19 Jul 2025 22:22 UTC

314 points

65 comments13 min readLW link

Cheating at Bets with the Even Odds Algorithm

omark19 Jul 2025 22:06 UTC

12 points

3 comments6 min readLW link

Can We Trust the Judge? A novel method of Modelling Human Bias and Systematic Error in Debate-Based Scalable Oversight

Andreea Zaman19 Jul 2025 21:44 UTC

1 point

0 comments7 min readLW link

Peeling Back The Remoteness of Sources

adamShimi19 Jul 2025 17:41 UTC

16 points

1 comment13 min readLW link

(formethods.substack.com)

Sequential Coherence: A Bottleneck in Automation

eeeee, xavi_ferres and felixgaston

19 Jul 2025 15:27 UTC

26 points

2 comments11 min readLW link

How Misaligned AI Personas Lead to Human Extinction – Step by Step

Writer19 Jul 2025 13:59 UTC

14 points

0 comments7 min readLW link

(youtu.be)

L0 is not a neutral hyperparameter

chanind and Adrià Garriga-alonso

19 Jul 2025 13:51 UTC

24 points

3 comments5 min readLW link

From Messy Shelves to Master Librarians: Toy-Model Exploration of Block-Diagonal Geometry in LM Activations

Yuxiao19 Jul 2025 12:26 UTC

6 points

1 comment4 min readLW link

OpenAI Claims IMO Gold Medal

Mikhail Samin19 Jul 2025 9:58 UTC

77 points

74 comments1 min readLW link

(x.com)

On the deep (uncurable?) vulnerability of MCPs

awu19 Jul 2025 2:50 UTC

5 points

6 comments1 min readLW link

(www.generalanalysis.com)

[Question] Best way to ask laypeople for conditional probabilities in a Bayes net?

Zack Friedman19 Jul 2025 2:45 UTC

11 points

1 comment1 min readLW link

[Question] Get sued or kill someone: The trolly problems of Psychological practice.

Brad Dunn18 Jul 2025 23:35 UTC

12 points

2 comments3 min readLW link

resume limiting

bhauth18 Jul 2025 23:31 UTC

18 points

13 comments2 min readLW link

(www.bhauth.com)

[Linkpost] How Am I Getting Along with AI?

Gunnar_Zarncke18 Jul 2025 22:26 UTC

11 points

0 comments1 min readLW link

(jessiefischbein.substack.com)

Agents lag behind AI 2027′s schedule

OhadA18 Jul 2025 21:49 UTC

25 points

7 comments4 min readLW link

Emergent Gravity—order out of chaos

James Stephen Brown18 Jul 2025 19:26 UTC

3 points

6 comments5 min readLW link

(nonzerosum.games)

Love stays loved (formerly “Skin”)

Swimmer963 (Miranda Dixon-Luinenburg) 18 Jul 2025 19:17 UTC

282 points

12 comments29 min readLW link

Why Alignment Fails Without a Functional Model of Intelligence

CC4CI18 Jul 2025 18:02 UTC

7 points

4 comments1 min readLW link

The Rising Premium of Life, Part 2

Linch18 Jul 2025 17:42 UTC

19 points

0 comments20 min readLW link

(linch.substack.com)

The Story of the World’s First AI-Organized Event

Shoshannah Tekofsky18 Jul 2025 17:41 UTC

31 points

4 comments8 min readLW link

(theaidigest.org)

Why it’s hard to make settings for high-stakes control research

Buck18 Jul 2025 16:33 UTC

49 points

6 comments4 min readLW link

Making of IAN v2

Jan18 Jul 2025 16:13 UTC

17 points

0 comments8 min readLW link

(universalprior.substack.com)

On METR’s AI Coding RCT

Zvi18 Jul 2025 12:40 UTC

52 points

6 comments10 min readLW link

(thezvi.wordpress.com)

Should you steelman what you don’t understand?

CstineSublime18 Jul 2025 10:26 UTC

6 points

5 comments6 min readLW link

“Some Basic Level of Mutual Respect About Whether Other People Deserve to Live”?!

Zack_M_Davis18 Jul 2025 6:41 UTC

26 points

84 comments4 min readLW link

There’s no way to stop models knowing they’ve been rolled back

Adam Mcmurchie18 Jul 2025 3:14 UTC

5 points

3 comments2 min readLW link

I Have Found You Once Again, My Cult (But In A Good Way)

Victor At Gizli18 Jul 2025 3:13 UTC

8 points

2 comments3 min readLW link

Notes on spaced repetition scheduling

nwm18 Jul 2025 2:32 UTC

29 points

7 comments7 min readLW link

Why do Mechanistic Interpretability?

Prudhviraj Naidu17 Jul 2025 23:21 UTC

2 points

0 comments5 min readLW link

Ketamine Part 1: Dosing

Elizabeth17 Jul 2025 20:10 UTC

25 points

0 comments7 min readLW link

(acesounderglass.com)

Aurelius: A Peer-to-Peer Alignment Protocol

Austin McCaffrey17 Jul 2025 19:13 UTC

3 points

4 comments1 min readLW link

(github.com)

Self-Control is now an Engineering Problem

Josh Mitchell17 Jul 2025 18:13 UTC

−6 points

4 comments5 min readLW link

Video and transcript of talk on “Can goodness compete?”

Joe Carlsmith17 Jul 2025 17:54 UTC

98 points

19 comments34 min readLW link

(joecarlsmith.substack.com)

Are agent-action-dependent beliefs underdetermined by external reality?

Said Achmiz17 Jul 2025 14:33 UTC

21 points

16 comments6 min readLW link

AI #125: Smooth Criminal

Zvi17 Jul 2025 14:30 UTC

33 points

0 comments56 min readLW link

(thezvi.wordpress.com)

AI Offense Defense Balance in a Multipolar World

otto.barten and Sammy Martin

17 Jul 2025 9:34 UTC

15 points

5 comments18 min readLW link

(www.existentialriskobservatory.org)

Biweekly AI Safety Comms Meetup

Vishakha17 Jul 2025 7:50 UTC

5 points

0 comments1 min readLW link

Do you care about your clone?

Harry Partridge17 Jul 2025 6:06 UTC

8 points

7 comments2 min readLW link

Comment on “Four Layers of Intellectual Conversation”

Zack_M_Davis17 Jul 2025 3:53 UTC

66 points

11 comments5 min readLW link

Towards plausible moral naturalism

jessicata17 Jul 2025 1:51 UTC

18 points

9 comments9 min readLW link

(unstableontology.com)

Assign Probabilities Functorially

kaleb17 Jul 2025 1:49 UTC

8 points

6 comments9 min readLW link

Trying the Obvious Thing

PranavG and Gabriel Alfour

16 Jul 2025 22:24 UTC

38 points

2 comments3 min readLW link

(cognition.cafe)

Emergence vs Entropy—a universal paradox

James Stephen Brown16 Jul 2025 21:31 UTC

4 points

0 comments4 min readLW link

Selective Generalization: Improving Capabilities While Maintaining Alignment

ariana_azarbal, Matthew A. Clarke, Jorio Cocola, Cailley Factor and cloud

16 Jul 2025 21:25 UTC

82 points

6 comments7 min readLW link

Bodydouble / Thinking Assistant matchmaking

Raemon16 Jul 2025 19:54 UTC

51 points

10 comments2 min readLW link

Zero sum expectations as an explanation of omnicide-indifference

asasz16 Jul 2025 19:25 UTC

2 points

6 comments2 min readLW link

On the geometrical Nature of Insight

Giuseppe Birardi16 Jul 2025 19:12 UTC

7 points

0 comments41 min readLW link

Vancouver Rationalists/Transhumanists/Futurists Beach Meetup

apocalypticc16 Jul 2025 19:09 UTC

2 points

0 comments1 min readLW link

What is the probability that future AI development will be seriously delayed or ended due to energy decline ?

AdamLacerdo16 Jul 2025 19:08 UTC

−1 points

12 comments1 min readLW link

Rebooting the Singularity

cdkg16 Jul 2025 18:26 UTC

8 points

0 comments1 min readLW link

(philpapers.org)