All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

So You Want To Make Marginal Progress...

johnswentworth7 Feb 2025 23:22 UTC

311 points

42 comments4 min readLW link

Reasons-based choice and cluelessness

JesseClifton7 Feb 2025 22:21 UTC

35 points

0 comments10 min readLW link

[Translation] In the Age of AI don’t Look for Unicorns

mushroomsoup7 Feb 2025 21:06 UTC

3 points

0 comments10 min readLW link

Racing Towards Fusion and AI

Jeffrey Heninger7 Feb 2025 20:40 UTC

49 points

11 comments7 min readLW link

‘High-Level Machine Intelligence’ and ‘Full Automation of Labor’ in the AI Impacts Surveys

Jeffrey Heninger7 Feb 2025 20:40 UTC

11 points

1 comment7 min readLW link

Request for Information for a new US AI Action Plan (OSTP RFI)

agucova7 Feb 2025 20:40 UTC

5 points

0 comments2 min readLW link

(www.federalregister.gov)

A Problem to Solve Before Building a Deception Detector

Eleni Angelou and lewis smith

7 Feb 2025 19:35 UTC

78 points

12 comments14 min readLW link

Request for proposals: improving capability evaluations

cb7 Feb 2025 18:51 UTC

1 point

0 comments1 min readLW link

(www.openphilanthropy.org)

How AI Takeover Might Happen in 2 Years

joshc7 Feb 2025 17:10 UTC

431 points

142 comments29 min readLW link

(x.com)

the devil’s ontology

lostinwilliamsburg7 Feb 2025 14:18 UTC

−1 points

14 comments6 min readLW link

On the Meta and DeepMind Safety Frameworks

Zvi7 Feb 2025 13:10 UTC

45 points

1 comment17 min readLW link

(thezvi.wordpress.com)

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

ChengCheng, Brendan Murphy, Adrià Garriga-alonso, Yashvardhan Sharma, dsbowen, smallsilo, Yawen Duan, ChrisCundy, Hannah Betts, AdamGleave and Kellin Pelrine

7 Feb 2025 3:57 UTC

37 points

0 comments10 min readLW link

When you downvote, explain why

KvmanThinking7 Feb 2025 1:03 UTC

7 points

31 comments1 min readLW link

Medical Windfall Prizes

PeterMcCluskey6 Feb 2025 23:33 UTC

5 points

1 comment5 min readLW link

(bayesianinvestor.com)

Do No Harm? Navigating and Nudging AI Moral Choices

Sinem, pandelis and Adam Newgas

6 Feb 2025 19:18 UTC

11 points

0 comments9 min readLW link

Open Philanthropy Technical AI Safety RFP - $40M Available Across 21 Research Areas

jake_mendel, maxnadeau and Peter Favaloro

6 Feb 2025 18:58 UTC

111 points

0 comments1 min readLW link

(www.openphilanthropy.org)

AISN #47: Reasoning Models

Corin Katzke and Dan H

6 Feb 2025 18:52 UTC

3 points

0 comments4 min readLW link

(newsletter.safe.ai)

Wild Animal Suffering Is The Worst Thing In The World

Bentham's Bulldog6 Feb 2025 16:15 UTC

27 points

18 comments7 min readLW link

Detecting Strategic Deception Using Linear Probes

Nicholas Goldowsky-Dill, bilalchughtai, StefanHex and Marius Hobbhahn

6 Feb 2025 15:46 UTC

104 points

9 comments2 min readLW link

(arxiv.org)

AI #102: Made in America

Zvi6 Feb 2025 14:20 UTC

26 points

18 comments67 min readLW link

(thezvi.wordpress.com)

Biology, Ideology and Violence

Zero Contradictions6 Feb 2025 11:26 UTC

−3 points

5 comments2 min readLW link

(thewaywardaxolotl.blogspot.com)

MATS Applications + Research Directions I’m Currently Excited About

Neel Nanda6 Feb 2025 11:03 UTC

73 points

7 comments8 min readLW link

Don’t go bankrupt, don’t go rogue

Nathan Young6 Feb 2025 10:31 UTC

20 points

1 comment7 min readLW link

Voting Results for the 2023 Review

Raemon6 Feb 2025 8:00 UTC

88 points

3 comments69 min readLW link

Chicanery: No

Screwtape6 Feb 2025 5:42 UTC

36 points

11 comments5 min readLW link

[Question] hypnosis question

KvmanThinking6 Feb 2025 2:41 UTC

3 points

5 comments1 min readLW link

BIDA Calendar iCal Feed

jefftk6 Feb 2025 1:30 UTC

10 points

0 comments1 min readLW link

(www.jefftk.com)

C’mon guys, Deliberate Practice is Real

Raemon5 Feb 2025 22:33 UTC

102 points

25 comments9 min readLW link

The Risk of Gradual Disempowerment from AI

Zvi5 Feb 2025 22:10 UTC

87 points

20 comments20 min readLW link

(thezvi.wordpress.com)

Wired on: “DOGE personnel with admin access to Federal Payment System”

Raemon5 Feb 2025 21:32 UTC

88 points

45 comments2 min readLW link

(web.archive.org)

On AI Scaling

harsimony5 Feb 2025 20:24 UTC

6 points

3 comments8 min readLW link

(splittinginfinity.substack.com)

The State of Metaculus

ChristianWilliams5 Feb 2025 19:17 UTC

21 points

0 comments6 min readLW link

(www.metaculus.com)

Post-hoc reasoning in chain of thought

Kyle Cox5 Feb 2025 18:58 UTC

20 points

0 comments11 min readLW link

DeepSeek-R1 for Beginners

Anton Razzhigaev5 Feb 2025 18:58 UTC

13 points

0 comments8 min readLW link

Making the case for average-case AI Control

Nathaniel Mitrani5 Feb 2025 18:56 UTC

5 points

0 comments5 min readLW link

[Question] Alignment Paradox and a Request for Harsh Criticism

Bridgett Kay5 Feb 2025 18:17 UTC

6 points

7 comments1 min readLW link

Introducing International AI Governance Alliance (IAIGA)

jamesnorris5 Feb 2025 16:02 UTC

7 points

0 comments1 min readLW link

Introducing Collective Action for Existential Safety: 80+ actions individuals, organizations, and nations can take to improve our existential safety

jamesnorris5 Feb 2025 16:02 UTC

−7 points

2 comments1 min readLW link

Language Models Use Trigonometry to Do Addition

Subhash Kantamneni5 Feb 2025 13:50 UTC

80 points

1 comment10 min readLW link

Deploying the Observer will save humanity from existential threats

Aram Panasenco5 Feb 2025 10:39 UTC

−11 points

8 comments1 min readLW link

The Domain of Orthogonality

mgfcatherall5 Feb 2025 8:14 UTC

1 point

0 comments7 min readLW link

Reviewing LessWrong: Screwtape’s Basic Answer

Screwtape5 Feb 2025 4:30 UTC

97 points

18 comments6 min readLW link

[Question] Why isn’t AI containment the primary AI safety strategy?

Oliver Kuperman5 Feb 2025 3:54 UTC

1 point

3 comments3 min readLW link

Nick Land: Orthogonality

lumpenspace4 Feb 2025 21:07 UTC

5 points

37 comments8 min readLW link

What working on AI safety taught me about B2B SaaS sales

purple fire4 Feb 2025 20:50 UTC

7 points

12 comments5 min readLW link

Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

Daniel Herrmann, Aydin Mohseni and ben_levinstein

4 Feb 2025 20:34 UTC

48 points

22 comments5 min readLW link

Anti-Slop Interventions?

abramdemski4 Feb 2025 19:50 UTC

78 points

33 comments6 min readLW link

Can Persuasion Break AI Safety? Exploring the Interplay Between Fine-Tuning, Attacks, and Guardrails

Devina Jain4 Feb 2025 19:10 UTC

9 points

0 comments10 min readLW link

[Question] Journalism student looking for sources

pinkerton4 Feb 2025 18:58 UTC

11 points

3 comments1 min readLW link

We’re in Deep Research

Zvi4 Feb 2025 17:20 UTC

45 points

3 comments20 min readLW link

(thezvi.wordpress.com)