All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan Feb Mar Apr May JunJulAug Sep Oct

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151617 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Defining Monitorable and Useful Goals

Rubi J. Hudson15 Jul 2025 23:06 UTC

11 points

0 comments16 min readLW link

[Question] Do you have any recommendations for readings on global risk forecasting and analysis applied to public policy design on a slightly smaller scale, or for more specific objectives?

Ana Lopez15 Jul 2025 22:00 UTC

1 point

0 comments1 min readLW link

1 week fast on livestream for AI xrisk

samuelshadrach15 Jul 2025 21:36 UTC

1 point

2 comments1 min readLW link

AISN #59: EU Publishes General-Purpose AI Code of Practice

Corin Katzke and Dan H

15 Jul 2025 18:59 UTC

10 points

0 comments4 min readLW link

(aisafety.substack.com)

Principles for Picking Practical Interpretability Projects

Sam Marks15 Jul 2025 17:38 UTC

27 points

0 comments13 min readLW link

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Tomek Korbak, Mikita Balesni, Vlad Mikulik and Rohin Shah

15 Jul 2025 16:23 UTC

166 points

32 comments1 min readLW link

(bit.ly)

The Virtue of Fear and the Myth of “Fearlessness”

David_Veksler15 Jul 2025 16:10 UTC

7 points

3 comments1 min readLW link

Grok 4 Various Things

Zvi15 Jul 2025 15:50 UTC

50 points

4 comments32 min readLW link

(thezvi.wordpress.com)

Value systems of the frontier AIs, reduced to slogans

Mitchell_Porter15 Jul 2025 15:10 UTC

4 points

0 comments1 min readLW link

What is David Chapman talking about when he talks about “meaning” in his book “Meaningness”?

SpectrumDT15 Jul 2025 14:29 UTC

22 points

15 comments2 min readLW link

Why Eliminating Deception Won’t Align AI

Priyanka Bharadwaj15 Jul 2025 9:21 UTC

19 points

6 comments4 min readLW link

Generalizing zombie arguments

jessicata15 Jul 2025 5:09 UTC

23 points

9 comments7 min readLW link

(unstableontology.com)

Do confident short timelines make sense?

TsviBT and abramdemski

15 Jul 2025 3:37 UTC

138 points

76 comments69 min readLW link

Critic Contributions Are Logically Irrelevant

Zack_M_Davis15 Jul 2025 1:03 UTC

27 points

74 comments6 min readLW link

AISafety.com Hackathon 2025

Bryce Robertson15 Jul 2025 0:04 UTC

12 points

0 comments1 min readLW link

Don’t Say “I Want to Work In AI Policy”

henryj14 Jul 2025 23:19 UTC

5 points

0 comments2 min readLW link

(www.henryjosephson.com)

Recent Redwood Research project proposals

ryan_greenblatt, Buck, Julian Stastny, joshc, Alex Mallen, Adam Kaufman , Tyler Tracy, Aryan Bhatt and Joey Yudelson

14 Jul 2025 22:27 UTC

91 points

0 comments3 min readLW link

The Role of Respect: Why we inevitably appeal to authority

jimmy14 Jul 2025 21:28 UTC

18 points

2 comments12 min readLW link

Making Sense of Consciousness Part 3: The Pulvinar Nucleus

sarahconstantin14 Jul 2025 21:20 UTC

14 points

0 comments10 min readLW link

(sarahconstantin.substack.com)

LLM-induced craziness and base rates

Kaj_Sotala14 Jul 2025 21:16 UTC

70 points

2 comments2 min readLW link

(andymasley.substack.com)

Narrow Misalignment is Hard, Emergent Misalignment is Easy

Edward Turner, Anna Soligo, Senthooran Rajamanoharan and Neel Nanda

14 Jul 2025 21:05 UTC

130 points

23 comments5 min readLW link

What do you Want out of Literature Reviews?

Elizabeth14 Jul 2025 20:20 UTC

25 points

4 comments4 min readLW link

(acesounderglass.com)

The Three Ideological Stances

PranavG and Gabriel Alfour

14 Jul 2025 20:14 UTC

2 points

0 comments3 min readLW link

(cognition.cafe)

Visualizing AI Alignment – CFP for AGI-2025 Workshop (Aug 10, Live + Virtual)

CC4CI14 Jul 2025 20:12 UTC

9 points

0 comments4 min readLW link

[Question] Is the political right becoming actively, explicitly antisemitic?

lc14 Jul 2025 18:57 UTC

28 points

16 comments1 min readLW link

Weird Features in Protein LLMs: The Gram Lens

Jude Stiel14 Jul 2025 17:32 UTC

8 points

0 comments9 min readLW link

METR: How Does Time Horizon Vary Across Domains?

Thomas Kwa and Vincent Cheng

14 Jul 2025 16:13 UTC

84 points

8 comments14 min readLW link

(metr.org)

Worse Than MechaHitler

Zvi14 Jul 2025 16:00 UTC

53 points

1 comment22 min readLW link

(thezvi.wordpress.com)

How To Cause Less Suffering While Eating Animals

Bentham's Bulldog14 Jul 2025 15:59 UTC

11 points

3 comments4 min readLW link

Self-preservation or Instruction Ambiguity? Examining the Causes of Shutdown Resistance

Senthooran Rajamanoharan and Neel Nanda

14 Jul 2025 14:52 UTC

67 points

18 comments11 min readLW link

Bernie Sanders (I-VT) mentions AI loss of control risk in Gizmodo interview

Matrice Jacobine14 Jul 2025 14:47 UTC

42 points

2 comments1 min readLW link

(gizmodo.com)

Arrow theorem is an artifact of ordinal preferences

Arturo Macias14 Jul 2025 14:08 UTC

7 points

4 comments4 min readLW link

Shanzson AI 2027 Timeline

shanzson14 Jul 2025 10:21 UTC

13 points

11 comments8 min readLW link

(mirror.xyz)

Lead, Own, Share: Sovereign Wealth Funds for Transformative AI

Matrice Jacobine14 Jul 2025 9:34 UTC

8 points

0 comments1 min readLW link

(www.convergenceanalysis.org)

Deliberative Credit Assignment: Making Faithful Reasoning Profitable

Florian_Dietz14 Jul 2025 9:26 UTC

9 points

3 comments17 min readLW link

The History of FSRS for Anki

L.M.Sherlock14 Jul 2025 8:11 UTC

26 points

0 comments14 min readLW link

(l-m-sherlock.notion.site)

Don’t fight your LLM, redirect it!

Yair Halberstadt14 Jul 2025 6:50 UTC

19 points

2 comments1 min readLW link

Actionable Moderation Proposals from comments tree

ProgramCrafter14 Jul 2025 6:41 UTC

6 points

0 comments2 min readLW link

Aspiring to Great Solstice Speeches: Mostly-Obvious Advice

Czynski14 Jul 2025 2:29 UTC

9 points

5 comments14 min readLW link

Why are effect sizes so small?

Jacob Goldsmith14 Jul 2025 1:17 UTC

1 point

0 comments4 min readLW link

Liv Boeree—non-zero hero

James Stephen Brown13 Jul 2025 23:49 UTC

1 point

0 comments2 min readLW link

(nonzerosum.games)

Moloch’s Demise—solving the original problem

James Stephen Brown13 Jul 2025 23:29 UTC

9 points

8 comments1 min readLW link

(nonzerosum.games)

4 Ways Moloch is Ruining Your Life!—a listicle that shows Moloch is all around us, even in listicles

James Stephen Brown13 Jul 2025 23:27 UTC

5 points

0 comments2 min readLW link

(nonzerosum.games)

Three Missing Cakes, or One Turbulent Critic?

Benquo13 Jul 2025 23:08 UTC

31 points

21 comments3 min readLW link

O(1) reasoning in latent space: 1ms inference, 77% accuracy, no attention or tokens

Founder Order One13 Jul 2025 22:54 UTC

−11 points

9 comments2 min readLW link

On actually taking expressions literally: tension as the key to meditation?

Chris_Leong13 Jul 2025 22:49 UTC

16 points

12 comments5 min readLW link

[Question] Why is LW not about winning?

azergante13 Jul 2025 22:36 UTC

21 points

21 comments1 min readLW link

LLMs are stuck in Plato’s cave

Sean Herrington13 Jul 2025 20:37 UTC

7 points

3 comments6 min readLW link

Do LLMs know what they’re capable of? Why this matters for AI safety, and initial findings

Casey Barkan, Sid Black and Oliver Sourbut

13 Jul 2025 19:54 UTC

51 points

5 comments18 min readLW link

10x more training compute = 5x greater task length (kind of)

Expertium13 Jul 2025 18:40 UTC

48 points

8 comments2 min readLW link