All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 171819 20 21 22 23 24 25 26 27 28 29 30 31

FrontierMath Score of o3-mini Much Lower Than Claimed

YafahEdelman17 Mar 2025 22:41 UTC

61 points

7 comments1 min readLW link

Proof-of-Concept Debugger for a Small LLM

Peter Lai and StefanHex

17 Mar 2025 22:27 UTC

27 points

0 comments11 min readLW link

Effectively Communicating with DC Policymakers

PolicyTakes17 Mar 2025 22:11 UTC

14 points

0 comments2 min readLW link

EIS XV: A New Proof of Concept for Useful Interpretability

scasper17 Mar 2025 20:05 UTC

30 points

2 comments3 min readLW link

Sentinel’s Global Risks Weekly Roundup #11/2025. Trump invokes Alien Enemies Act, Chinese invasion barges deployed in exercise.

NunoSempere17 Mar 2025 19:34 UTC

59 points

3 comments6 min readLW link

(blog.sentinel-team.org)

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations

Nicholas Goldowsky-Dill, Mikita Balesni, Jérémy Scheurer and Marius Hobbhahn

17 Mar 2025 19:11 UTC

188 points

9 comments6 min readLW link

Three Types of Intelligence Explosion

rosehadshar, Tom Davidson and wdmacaskill

17 Mar 2025 14:47 UTC

40 points

8 comments3 min readLW link

(www.forethought.org)

An Advent of Thought

Kaarel17 Mar 2025 14:21 UTC

57 points

13 comments48 min readLW link

Interested in working from a new Boston AI Safety Hub?

agucova and Topaz

17 Mar 2025 13:42 UTC

17 points

0 comments2 min readLW link

Other Civilizations Would Recover 84+% of Our Cosmic Resources—A Challenge to Extinction Risk Prioritization

Maxime Riché17 Mar 2025 13:12 UTC

5 points

0 comments12 min readLW link

Monthly Roundup #28: March 2025

Zvi17 Mar 2025 12:50 UTC

31 points

8 comments14 min readLW link

(thezvi.wordpress.com)

Are corporations superintelligent?

Vishakha and Algon

17 Mar 2025 10:36 UTC

3 points

3 comments1 min readLW link

(aisafety.info)

One pager

samuelshadrach17 Mar 2025 8:12 UTC

6 points

2 comments8 min readLW link

(samuelshadrach.com)

The Case for AI Optimism

Annapurna17 Mar 2025 1:29 UTC

−6 points

1 comment1 min readLW link

(nationalaffairs.com)

Systematic runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

Roland Pihlakas, Sruthi Kuriakose and shrutidattagupta

16 Mar 2025 23:23 UTC

45 points

8 comments13 min readLW link

What would a post labor economy actually look like?

Ansh Juneja16 Mar 2025 20:38 UTC

3 points

2 comments17 min readLW link

Why White-Box Redteaming Makes Me Feel Weird

Zygi Straznickas16 Mar 2025 18:54 UTC

206 points

36 comments3 min readLW link

How I’ve run major projects

benkuhn16 Mar 2025 18:40 UTC

127 points

10 comments8 min readLW link

(www.benkuhn.net)

Counting Objections to Housing

jefftk16 Mar 2025 18:20 UTC

13 points

7 comments3 min readLW link

(www.jefftk.com)

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?

shrimpy16 Mar 2025 16:52 UTC

161 points

26 comments1 min readLW link

Siberian Arctic origins of East Asian psychology

David Sun16 Mar 2025 16:52 UTC

6 points

0 comments1 min readLW link

AI Model History is Being Lost

Vale16 Mar 2025 12:38 UTC

19 points

1 comment1 min readLW link

(vale.rocks)

Metacognition Broke My Nail-Biting Habit

Rafka16 Mar 2025 12:36 UTC

45 points

20 comments2 min readLW link

[Question] Can we ever ensure AI alignment if we can only test AI personas?

Karl von Wendt16 Mar 2025 8:06 UTC

22 points

8 comments1 min readLW link

Can time preferences make AI safe?

TerriLeaf15 Mar 2025 21:41 UTC

2 points

1 comment2 min readLW link

Help make the orca language experiment happen

Towards_Keeperhood15 Mar 2025 21:39 UTC

9 points

12 comments5 min readLW link

Announcing EXP: Experimental Summer Workshop on Collective Cognition

Jan_Kulveit and Anna Gajdova

15 Mar 2025 20:14 UTC

36 points

2 comments4 min readLW link

AI Self-Correction vs. Self-Reflection: Is There a Fundamental Difference?

Project Solon15 Mar 2025 18:24 UTC

−3 points

0 comments1 min readLW link

The Fork in the Road

testingthewaters15 Mar 2025 17:36 UTC

14 points

12 comments2 min readLW link

Any-Benefit Mindset and Any-Reason Reasoning

silentbob15 Mar 2025 17:10 UTC

36 points

9 comments6 min readLW link

deleted

funnyfranco15 Mar 2025 15:24 UTC

−1 points

2 comments1 min readLW link

Paper: Field-building and the epistemic culture of AI safety

peterslattery15 Mar 2025 12:30 UTC

13 points

3 comments3 min readLW link

(firstmonday.org)

deleted

funnyfranco15 Mar 2025 6:08 UTC

8 points

0 comments1 min readLW link

AI Says It’s Not Conscious. That’s a Bad Answer to the Wrong Question.

JohnMarkNorman15 Mar 2025 1:25 UTC

1 point

0 comments2 min readLW link

Report & retrospective on the Dovetail fellowship

Alex_Altair14 Mar 2025 23:20 UTC

26 points

3 comments9 min readLW link

The Dangers of Outsourcing Thinking: Losing Our Critical Thinking to the Over-Reliance on AI Decision-Making

Cameron Tomé-Moreira14 Mar 2025 23:07 UTC

11 points

4 comments8 min readLW link

LLMs may enable direct democracy at scale

Davey Morse14 Mar 2025 22:51 UTC

14 points

20 comments1 min readLW link

2024 Unofficial LessWrong Survey Results

Screwtape14 Mar 2025 22:29 UTC

110 points

28 comments48 min readLW link

AI4Science: The Hidden Power of Neural Networks in Scientific Discovery

Max Ma14 Mar 2025 21:18 UTC

2 points

2 comments1 min readLW link

What are we doing when we do mathematics?

epicurus14 Mar 2025 20:54 UTC

7 points

2 comments1 min readLW link

(asving.com)

AI for Epistemics Hackathon

Austin Chen14 Mar 2025 20:46 UTC

76 points

12 comments10 min readLW link

(manifund.substack.com)

Geometry of Features in Mechanistic Interpretability

Gunnar Carlsson14 Mar 2025 19:11 UTC

16 points

0 comments8 min readLW link

AI Tools for Existential Security

Lizka and owencb

14 Mar 2025 18:38 UTC

22 points

4 comments11 min readLW link

(www.forethought.org)

deleted

funnyfranco14 Mar 2025 18:14 UTC

−3 points

2 comments1 min readLW link

Minor interpretability exploration #3: Extending superposition to different activation functions (loss landscape)

Rareș Baron14 Mar 2025 15:45 UTC

5 points

0 comments3 min readLW link

AI for AI safety

Joe Carlsmith14 Mar 2025 15:00 UTC

79 points

13 comments17 min readLW link

(joecarlsmith.substack.com)

Evaluating the ROI of Information

Mr. Keating14 Mar 2025 14:22 UTC

13 points

3 comments3 min readLW link

On MAIM and Superintelligence Strategy

Zvi14 Mar 2025 12:30 UTC

53 points

2 comments13 min readLW link

(thezvi.wordpress.com)

Whether governments will control AGI is important and neglected

Seth Herd14 Mar 2025 9:48 UTC

28 points

2 comments9 min readLW link