All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan FebMarApr May Jun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 141516 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Report & retrospective on the Dovetail fellowship

Alex_AltairMar 14, 2025, 11:20 PM

26 points

3 comments9 min readLW link

The Dangers of Outsourcing Thinking: Losing Our Critical Thinking to the Over-Reliance on AI Decision-Making

Cameron Tomé-MoreiraMar 14, 2025, 11:07 PM

11 points

4 comments8 min readLW link

LLMs may enable direct democracy at scale

Davey MorseMar 14, 2025, 10:51 PM

14 points

20 comments1 min readLW link

2024 Unofficial LessWrong Survey Results

ScrewtapeMar 14, 2025, 10:29 PM

109 points

28 comments48 min readLW link

AI4Science: The Hidden Power of Neural Networks in Scientific Discovery

Max MaMar 14, 2025, 9:18 PM

2 points

2 comments1 min readLW link

What are we doing when we do mathematics?

epicurusMar 14, 2025, 8:54 PM

7 points

1 comment1 min readLW link

(asving.com)

AI for Epistemics Hackathon

Austin ChenMar 14, 2025, 8:46 PM

77 points

12 comments10 min readLW link

(manifund.substack.com)

Geometry of Features in Mechanistic Interpretability

Gunnar CarlssonMar 14, 2025, 7:11 PM

16 points

0 comments8 min readLW link

AI Tools for Existential Security

Lizka and owencb

Mar 14, 2025, 6:38 PM

22 points

4 comments11 min readLW link

(www.forethought.org)

Capitalism as the Catalyst for AGI-Induced Human Extinction

funnyfrancoMar 14, 2025, 6:14 PM

−3 points

2 comments21 min readLW link

Minor interpretability exploration #3: Extending superposition to different activation functions (loss landscape)

Rareș BaronMar 14, 2025, 3:45 PM

3 points

0 comments3 min readLW link

AI for AI safety

Joe CarlsmithMar 14, 2025, 3:00 PM

78 points

13 comments17 min readLW link

(joecarlsmith.substack.com)

Evaluating the ROI of Information

Declan MolonyMar 14, 2025, 2:22 PM

12 points

3 comments3 min readLW link

On MAIM and Superintelligence Strategy

ZviMar 14, 2025, 12:30 PM

53 points

2 comments13 min readLW link

(thezvi.wordpress.com)

Whether governments will control AGI is important and neglected

Seth HerdMar 14, 2025, 9:48 AM

24 points

2 comments9 min readLW link

Something to fight for

RomanSMar 14, 2025, 8:27 AM

4 points

0 comments1 min readLW link

Interpreting Complexity

Maxwell AdamMar 14, 2025, 4:52 AM

53 points

8 comments26 min readLW link

Bike Lights are Cheap Enough to Give Away

jefftkMar 14, 2025, 2:10 AM

24 points

0 comments1 min readLW link

(www.jefftk.com)

Superintelligence’s goals are likely to be random

Mikhail SaminMar 13, 2025, 10:41 PM

6 points

6 comments5 min readLW link

Should AI safety be a mass movement?

mhamptonMar 13, 2025, 8:36 PM

5 points

1 comment4 min readLW link

Auditing language models for hidden objectives

Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Kei, 7vik, Akbir Khan, Austin Meek, Euan Ong, Christopher Olah, Fabien Roger, jeanne_, Meg, Drake Thomas, Adam Jermyn, Monte M and evhub

Mar 13, 2025, 7:18 PM

141 points

15 comments13 min readLW link

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Cameron Berg, Mike Vaiana and AE Studio

Mar 13, 2025, 7:09 PM

155 points

41 comments6 min readLW link

Vacuum Decay: Expert Survey Results

JessRiedelMar 13, 2025, 6:31 PM

96 points

26 comments LW link

A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management

simeon_c and Henry Papadatos

Mar 13, 2025, 6:29 PM

10 points

0 comments1 min readLW link

(arxiv.org)

Creating Complex Goals: A Model to Create Autonomous Agents

theravenMar 13, 2025, 6:17 PM

6 points

1 comment6 min readLW link

Habermas Machine

NicholasKeesMar 13, 2025, 6:16 PM

49 points

7 comments6 min readLW link

(mosaic-labs.org)

The Other Alignment Problem: Maybe AI Needs Protection From Us

PeterpiperMar 13, 2025, 6:03 PM

−3 points

0 comments3 min readLW link

AI #107: The Misplaced Hype Machine

ZviMar 13, 2025, 2:40 PM

47 points

10 comments40 min readLW link

(thezvi.wordpress.com)

Intelsat as a Model for International AGI Governance

rosehadshar and wdmacaskill

Mar 13, 2025, 12:58 PM

45 points

0 comments1 min readLW link

(www.forethought.org)

Stacity: a Lock-In Risk Benchmark for Large Language Models

alamertonMar 13, 2025, 12:08 PM

4 points

0 comments1 min readLW link

(huggingface.co)

The prospect of accelerated AI safety progress, including philosophical progress

Mitchell_PorterMar 13, 2025, 10:52 AM

11 points

0 comments4 min readLW link

The “Reversal Curse”: you still aren’t antropomorphising enough.

lumpenspaceMar 13, 2025, 10:24 AM

3 points

0 comments1 min readLW link

(lumpenspace.substack.com)

Formalizing Space-Faring Civilizations Saturation concepts and metrics

Maxime RichéMar 13, 2025, 9:40 AM

4 points

0 comments8 min readLW link

The Economics of p(doom)

Jakub GrowiecMar 13, 2025, 7:33 AM

2 points

0 comments1 min readLW link

Social Media: How to fix them before they become the biggest news platform

Sam GMar 13, 2025, 7:28 AM

5 points

2 comments3 min readLW link

Penny Whistle in E?

jefftkMar 13, 2025, 2:40 AM

9 points

1 comment1 min readLW link

(www.jefftk.com)

Anthropic, and taking “technical philosophy” more seriously

RaemonMar 13, 2025, 1:48 AM

125 points

29 comments11 min readLW link

LW/ACX Social Meetup

StefanMar 12, 2025, 11:13 PM

2 points

0 comments1 min readLW link

I grade every NBA basketball game I watch based on enjoyability

proshowersingerMar 12, 2025, 9:46 PM

24 points

2 comments4 min readLW link

Kairos is hiring a Head of Operations/Founding Generalist

agucovaMar 12, 2025, 8:58 PM

6 points

0 comments LW link

USAID Outlook: A Metaculus Forecasting Series

ChristianWilliamsMar 12, 2025, 8:34 PM

9 points

0 comments LW link

(www.metaculus.com)

What is instrumental convergence?

Vishakha and Algon

Mar 12, 2025, 8:28 PM

2 points

0 comments2 min readLW link

(aisafety.info)

Revising Stages-Oversight Reveals Greater Situational Awareness in LLMs

Sanyu RajakumarMar 12, 2025, 5:56 PM

16 points

0 comments13 min readLW link

Why Obedient AI May Be the Real Catastrophe

G~Mar 12, 2025, 5:50 PM

5 points

2 comments3 min readLW link

Your Communication Preferences Aren’t Law

Jonathan MoregårdMar 12, 2025, 5:20 PM

25 points

4 comments1 min readLW link

(honestliving.substack.com)

Reflections on Neuralese

Alice BlairMar 12, 2025, 4:29 PM

28 points

0 comments5 min readLW link

Field tests of semi-rationality in Brazilian military training

P. JoãoMar 12, 2025, 4:14 PM

31 points

0 comments2 min readLW link

Many life-saving drugs fail for lack of funding. But there’s a solution: desperate rich people

MvolzMar 12, 2025, 3:24 PM

17 points

0 comments1 min readLW link

(www.theguardian.com)

The Most Forbidden Technique

ZviMar 12, 2025, 1:20 PM

143 points

9 comments17 min readLW link

(thezvi.wordpress.com)

You don’t actually need a physical multiverse to explain anthropic fine-tuning.

FraserMar 12, 2025, 7:33 AM

7 points

8 comments3 min readLW link

(frvser.com)