All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 171819 20 21 22 23 24 25 26 27 28 29 30 31

Why do Mechanistic Interpretability?

Prudhviraj Naidu17 Jul 2025 23:21 UTC

2 points

0 comments5 min readLW link

Ketamine Part 1: Dosing

Elizabeth17 Jul 2025 20:10 UTC

25 points

0 comments7 min readLW link

(acesounderglass.com)

Aurelius: A Peer-to-Peer Alignment Protocol

Austin McCaffrey17 Jul 2025 19:13 UTC

3 points

4 comments1 min readLW link

(github.com)

Self-Control is now an Engineering Problem

Josh Mitchell17 Jul 2025 18:13 UTC

−6 points

4 comments5 min readLW link

Video and transcript of talk on “Can goodness compete?”

Joe Carlsmith17 Jul 2025 17:54 UTC

98 points

19 comments34 min readLW link

(joecarlsmith.substack.com)

Are agent-action-dependent beliefs underdetermined by external reality?

Said Achmiz17 Jul 2025 14:33 UTC

21 points

16 comments6 min readLW link

AI #125: Smooth Criminal

Zvi17 Jul 2025 14:30 UTC

33 points

0 comments56 min readLW link

(thezvi.wordpress.com)

AI Offense Defense Balance in a Multipolar World

otto.barten and Sammy Martin

17 Jul 2025 9:34 UTC

15 points

5 comments18 min readLW link

(www.existentialriskobservatory.org)

Biweekly AI Safety Comms Meetup

Vishakha17 Jul 2025 7:50 UTC

5 points

0 comments1 min readLW link

Do you care about your clone?

Harry Partridge17 Jul 2025 6:06 UTC

8 points

7 comments2 min readLW link

Comment on “Four Layers of Intellectual Conversation”

Zack_M_Davis17 Jul 2025 3:53 UTC

66 points

11 comments5 min readLW link

Towards plausible moral naturalism

jessicata17 Jul 2025 1:51 UTC

18 points

9 comments9 min readLW link

(unstableontology.com)

Assign Probabilities Functorially

kaleb17 Jul 2025 1:49 UTC

8 points

6 comments9 min readLW link

Trying the Obvious Thing

PranavG and Gabriel Alfour

16 Jul 2025 22:24 UTC

38 points

2 comments3 min readLW link

(cognition.cafe)

Emergence vs Entropy—a universal paradox

James Stephen Brown16 Jul 2025 21:31 UTC

4 points

0 comments4 min readLW link

Selective Generalization: Improving Capabilities While Maintaining Alignment

ariana_azarbal, Matthew A. Clarke, Jorio Cocola, Cailley Factor and cloud

16 Jul 2025 21:25 UTC

82 points

6 comments7 min readLW link

Bodydouble / Thinking Assistant matchmaking

Raemon16 Jul 2025 19:54 UTC

51 points

10 comments2 min readLW link

Zero sum expectations as an explanation of omnicide-indifference

asasz16 Jul 2025 19:25 UTC

2 points

6 comments2 min readLW link

On the geometrical Nature of Insight

Giuseppe Birardi16 Jul 2025 19:12 UTC

7 points

0 comments41 min readLW link

Vancouver Rationalists/Transhumanists/Futurists Beach Meetup

apocalypticc16 Jul 2025 19:09 UTC

2 points

0 comments1 min readLW link

What is the probability that future AI development will be seriously delayed or ended due to energy decline ?

AdamLacerdo16 Jul 2025 19:08 UTC

−1 points

12 comments1 min readLW link

Rebooting the Singularity

cdkg16 Jul 2025 18:26 UTC

8 points

0 comments1 min readLW link

(philpapers.org)

Being and Existence

Gordon Seidoh Worley16 Jul 2025 18:10 UTC

7 points

0 comments3 min readLW link

(uncertainupdates.substack.com)

Kimi K2

Zvi16 Jul 2025 16:20 UTC

54 points

5 comments12 min readLW link

(thezvi.wordpress.com)

[Question] How should Canada Negotiate with Trump on Tariffs?

Davey16 Jul 2025 15:56 UTC

1 point

2 comments1 min readLW link

[Question] Why haven’t we auto-translated all AI alignment content?

Algon16 Jul 2025 15:33 UTC

22 points

10 comments1 min readLW link

A Hallucination Filter Idea That Might Not Scale—Yet

8harath16 Jul 2025 14:40 UTC

−5 points

0 comments2 min readLW link

On being sort of back and sort of new here

Loki zen16 Jul 2025 12:55 UTC

32 points

13 comments3 min readLW link

Conway’s Game of Life—complexity emerges from simplicity

James Stephen Brown16 Jul 2025 4:42 UTC

3 points

1 comment2 min readLW link

(nonzerosum.games)

Emergent Price-Fixing by LLM Auction Agents

Lech Mazur16 Jul 2025 2:45 UTC

14 points

0 comments9 min readLW link

Mapping Mental Moves

Jordan Rubin16 Jul 2025 2:28 UTC

3 points

0 comments2 min readLW link

(jordanmrubin.substack.com)

Defining Monitorable and Useful Goals

Rubi J. Hudson15 Jul 2025 23:06 UTC

15 points

0 comments16 min readLW link

[Question] Do you have any recommendations for readings on global risk forecasting and analysis applied to public policy design on a slightly smaller scale, or for more specific objectives?

Ana Lopez15 Jul 2025 22:00 UTC

1 point

0 comments1 min readLW link

1 week fast on livestream for AI xrisk

samuelshadrach15 Jul 2025 21:36 UTC

1 point

2 comments1 min readLW link

AISN #59: EU Publishes General-Purpose AI Code of Practice

Corin Katzke and Dan H

15 Jul 2025 18:59 UTC

10 points

0 comments4 min readLW link

(aisafety.substack.com)

Principles for Picking Practical Interpretability Projects

Sam Marks15 Jul 2025 17:38 UTC

33 points

0 comments13 min readLW link

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Tomek Korbak, Mikita Balesni, Vlad Mikulik and Rohin Shah

15 Jul 2025 16:23 UTC

169 points

32 comments1 min readLW link

(bit.ly)

The Virtue of Fear and the Myth of “Fearlessness”

David_Veksler15 Jul 2025 16:10 UTC

7 points

3 comments1 min readLW link

Grok 4 Various Things

Zvi15 Jul 2025 15:50 UTC

52 points

4 comments32 min readLW link

(thezvi.wordpress.com)

Value systems of the frontier AIs, reduced to slogans

Mitchell_Porter15 Jul 2025 15:10 UTC

4 points

0 comments1 min readLW link

What is David Chapman talking about when he talks about “meaning” in his book “Meaningness”?

SpectrumDT15 Jul 2025 14:29 UTC

23 points

15 comments2 min readLW link

Why Eliminating Deception Won’t Align AI

Priyanka Bharadwaj15 Jul 2025 9:21 UTC

19 points

6 comments4 min readLW link

Generalizing zombie arguments

jessicata15 Jul 2025 5:09 UTC

25 points

9 comments7 min readLW link

(unstableontology.com)

Do confident short timelines make sense?

TsviBT and abramdemski

15 Jul 2025 3:37 UTC

140 points

78 comments69 min readLW link

Critic Contributions Are Logically Irrelevant

Zack_M_Davis15 Jul 2025 1:03 UTC

27 points

74 comments6 min readLW link

AISafety.com Hackathon 2025

Bryce Robertson15 Jul 2025 0:04 UTC

12 points

0 comments1 min readLW link

Don’t Say “I Want to Work In AI Policy”

henryj14 Jul 2025 23:19 UTC

7 points

0 comments2 min readLW link

(www.henryjosephson.com)

Recent Redwood Research project proposals

ryan_greenblatt, Buck, Julian Stastny, joshc, Alex Mallen, Adam Kaufman , Tyler Tracy, Aryan Bhatt and Joey Yudelson

14 Jul 2025 22:27 UTC

93 points

0 comments3 min readLW link

The Role of Respect: Why we inevitably appeal to authority

jimmy14 Jul 2025 21:28 UTC

18 points

2 comments12 min readLW link

Making Sense of Consciousness Part 3: The Pulvinar Nucleus

sarahconstantin14 Jul 2025 21:20 UTC

14 points

0 comments10 min readLW link

(sarahconstantin.substack.com)