All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan FebMarApr May Jun

All 1 2 3 4 5 6 7 8 9 10 11 12 131415 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Superintelligence’s goals are likely to be random

Mikhail SaminMar 13, 2025, 10:41 PM

6 points

6 comments5 min readLW link

Should AI safety be a mass movement?

mhamptonMar 13, 2025, 8:36 PM

5 points

1 comment4 min readLW link

Auditing language models for hidden objectives

Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Kei, 7vik, Akbir Khan, Austin Meek, Euan Ong, Christopher Olah, Fabien Roger, jeanne_, Meg, Drake Thomas, Adam Jermyn, Monte M and evhub

Mar 13, 2025, 7:18 PM

141 points

15 comments13 min readLW link

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Cameron Berg, Mike Vaiana and AE Studio

Mar 13, 2025, 7:09 PM

155 points

41 comments6 min readLW link

Vacuum Decay: Expert Survey Results

JessRiedelMar 13, 2025, 6:31 PM

96 points

26 comments LW link

A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management

simeon_c and Henry Papadatos

Mar 13, 2025, 6:29 PM

10 points

0 comments1 min readLW link

(arxiv.org)

Creating Complex Goals: A Model to Create Autonomous Agents

theravenMar 13, 2025, 6:17 PM

6 points

1 comment6 min readLW link

Habermas Machine

NicholasKeesMar 13, 2025, 6:16 PM

49 points

7 comments6 min readLW link

(mosaic-labs.org)

The Other Alignment Problem: Maybe AI Needs Protection From Us

PeterpiperMar 13, 2025, 6:03 PM

−3 points

0 comments3 min readLW link

AI #107: The Misplaced Hype Machine

ZviMar 13, 2025, 2:40 PM

47 points

10 comments40 min readLW link

(thezvi.wordpress.com)

Intelsat as a Model for International AGI Governance

rosehadshar and wdmacaskill

Mar 13, 2025, 12:58 PM

45 points

0 comments1 min readLW link

(www.forethought.org)

Stacity: a Lock-In Risk Benchmark for Large Language Models

alamertonMar 13, 2025, 12:08 PM

4 points

0 comments1 min readLW link

(huggingface.co)

The prospect of accelerated AI safety progress, including philosophical progress

Mitchell_PorterMar 13, 2025, 10:52 AM

11 points

0 comments4 min readLW link

The “Reversal Curse”: you still aren’t antropomorphising enough.

lumpenspaceMar 13, 2025, 10:24 AM

3 points

0 comments1 min readLW link

(lumpenspace.substack.com)

Formalizing Space-Faring Civilizations Saturation concepts and metrics

Maxime RichéMar 13, 2025, 9:40 AM

4 points

0 comments8 min readLW link

The Economics of p(doom)

Jakub GrowiecMar 13, 2025, 7:33 AM

2 points

0 comments1 min readLW link

Social Media: How to fix them before they become the biggest news platform

Sam GMar 13, 2025, 7:28 AM

5 points

2 comments3 min readLW link

Penny Whistle in E?

jefftkMar 13, 2025, 2:40 AM

9 points

1 comment1 min readLW link

(www.jefftk.com)

Anthropic, and taking “technical philosophy” more seriously

RaemonMar 13, 2025, 1:48 AM

125 points

29 comments11 min readLW link

LW/ACX Social Meetup

StefanMar 12, 2025, 11:13 PM

2 points

0 comments1 min readLW link

I grade every NBA basketball game I watch based on enjoyability

proshowersingerMar 12, 2025, 9:46 PM

24 points

2 comments4 min readLW link

Kairos is hiring a Head of Operations/Founding Generalist

agucovaMar 12, 2025, 8:58 PM

6 points

0 comments LW link

USAID Outlook: A Metaculus Forecasting Series

ChristianWilliamsMar 12, 2025, 8:34 PM

9 points

0 comments LW link

(www.metaculus.com)

What is instrumental convergence?

Vishakha and Algon

Mar 12, 2025, 8:28 PM

2 points

0 comments2 min readLW link

(aisafety.info)

Revising Stages-Oversight Reveals Greater Situational Awareness in LLMs

Sanyu RajakumarMar 12, 2025, 5:56 PM

16 points

0 comments13 min readLW link

Why Obedient AI May Be the Real Catastrophe

G~Mar 12, 2025, 5:50 PM

5 points

2 comments3 min readLW link

Your Communication Preferences Aren’t Law

Jonathan MoregårdMar 12, 2025, 5:20 PM

25 points

4 comments1 min readLW link

(honestliving.substack.com)

Reflections on Neuralese

Alice BlairMar 12, 2025, 4:29 PM

28 points

0 comments5 min readLW link

Field tests of semi-rationality in Brazilian military training

P. JoãoMar 12, 2025, 4:14 PM

31 points

0 comments2 min readLW link

Many life-saving drugs fail for lack of funding. But there’s a solution: desperate rich people

MvolzMar 12, 2025, 3:24 PM

17 points

0 comments1 min readLW link

(www.theguardian.com)

The Most Forbidden Technique

ZviMar 12, 2025, 1:20 PM

143 points

9 comments17 min readLW link

(thezvi.wordpress.com)

You don’t actually need a physical multiverse to explain anthropic fine-tuning.

FraserMar 12, 2025, 7:33 AM

7 points

8 comments3 min readLW link

(frvser.com)

AI Can’t Write Good Fiction

JustisMillsMar 12, 2025, 6:11 AM

38 points

24 comments7 min readLW link

(justismills.substack.com)

Existing UDTs test the limits of Bayesianism (and consistency)

Cole WyethMar 12, 2025, 4:09 AM

28 points

21 comments7 min readLW link

(Anti)Aging 101

George3d6Mar 12, 2025, 3:59 AM

5 points

2 comments3 min readLW link

(cerebralab.com)

The Grapes of Hardness

adamShimiMar 11, 2025, 9:01 PM

8 points

0 comments5 min readLW link

(formethods.substack.com)

Don’t over-update on FrontierMath results

David MatolcsiMar 11, 2025, 8:44 PM

51 points

7 comments9 min readLW link

Response to Scott Alexander on Imprisonment

ZviMar 11, 2025, 8:40 PM

40 points

4 comments9 min readLW link

(thezvi.wordpress.com)

Paths and waystations in AI safety

Joe CarlsmithMar 11, 2025, 6:52 PM

41 points

1 comment11 min readLW link

(joecarlsmith.substack.com)

Meridian Cambridge Visiting Researcher Programme: Turn AI safety ideas into funded projects in one week!

Meridian CambridgeMar 11, 2025, 5:46 PM

13 points

0 comments2 min readLW link

Elon Musk May Be Transitioning to Bipolar Type I

Cyborg25Mar 11, 2025, 5:45 PM

83 points

22 comments4 min readLW link

Scaling AI Regulation: Realistically, what Can (and Can’t) Be Regulated?

Katalina HernandezMar 11, 2025, 4:51 PM

3 points

1 comment3 min readLW link

How Language Models Understand Nullability

Anish Tondwalkar and Alex Sanchez-Stern

Mar 11, 2025, 3:57 PM

5 points

0 comments2 min readLW link

(dmodel.ai)

Forethought: a new AI macrostrategy group

Max Dalton, Tom Davidson, wdmacaskill and AmritSidhu-Brar

Mar 11, 2025, 3:39 PM

18 points

0 comments3 min readLW link

Preparing for the Intelligence Explosion

fin and wdmacaskill

Mar 11, 2025, 3:38 PM

78 points

17 comments1 min readLW link

(www.forethought.org)

stop solving problems that have already been solved

dhruvmethiMar 11, 2025, 3:30 PM

10 points

3 comments8 min readLW link

AI Control May Increase Existential Risk

Jan_KulveitMar 11, 2025, 2:30 PM

98 points

13 comments1 min readLW link

When is it Better to Train on the Alignment Proxy?

dil-leik-ogMar 11, 2025, 1:35 PM

14 points

0 comments9 min readLW link

A different take on the Musk v OpenAI preliminary injunction order

TFDMar 11, 2025, 12:46 PM

8 points

0 comments20 min readLW link

(www.thefloatingdroid.com)

Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases

Fabien RogerMar 11, 2025, 11:52 AM

121 points

23 comments11 min readLW link

(alignment.anthropic.com)