All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 678 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Medical Windfall Prizes

PeterMcCluskey6 Feb 2025 23:33 UTC

5 points

1 comment5 min readLW link

(bayesianinvestor.com)

Do No Harm? Navigating and Nudging AI Moral Choices

Sinem, pandelis and Adam Newgas

6 Feb 2025 19:18 UTC

11 points

0 comments9 min readLW link

Open Philanthropy Technical AI Safety RFP - $40M Available Across 21 Research Areas

jake_mendel, maxnadeau and Peter Favaloro

6 Feb 2025 18:58 UTC

111 points

0 comments1 min readLW link

(www.openphilanthropy.org)

AISN #47: Reasoning Models

Corin Katzke and Dan H

6 Feb 2025 18:52 UTC

3 points

0 comments4 min readLW link

(newsletter.safe.ai)

Wild Animal Suffering Is The Worst Thing In The World

Bentham's Bulldog6 Feb 2025 16:15 UTC

26 points

18 comments7 min readLW link

Detecting Strategic Deception Using Linear Probes

Nicholas Goldowsky-Dill, bilalchughtai, StefanHex and Marius Hobbhahn

6 Feb 2025 15:46 UTC

104 points

9 comments2 min readLW link

(arxiv.org)

AI #102: Made in America

Zvi6 Feb 2025 14:20 UTC

26 points

18 comments67 min readLW link

(thezvi.wordpress.com)

Biology, Ideology and Violence

Zero Contradictions6 Feb 2025 11:26 UTC

−3 points

5 comments2 min readLW link

(thewaywardaxolotl.blogspot.com)

MATS Applications + Research Directions I’m Currently Excited About

Neel Nanda6 Feb 2025 11:03 UTC

73 points

7 comments8 min readLW link

Don’t go bankrupt, don’t go rogue

Nathan Young6 Feb 2025 10:31 UTC

20 points

1 comment7 min readLW link

Voting Results for the 2023 Review

Raemon6 Feb 2025 8:00 UTC

86 points

3 comments69 min readLW link

Chicanery: No

Screwtape6 Feb 2025 5:42 UTC

33 points

11 comments5 min readLW link

[Question] hypnosis question

KvmanThinking6 Feb 2025 2:41 UTC

3 points

5 comments1 min readLW link

BIDA Calendar iCal Feed

jefftk6 Feb 2025 1:30 UTC

9 points

0 comments1 min readLW link

(www.jefftk.com)

C’mon guys, Deliberate Practice is Real

Raemon5 Feb 2025 22:33 UTC

99 points

25 comments9 min readLW link

The Risk of Gradual Disempowerment from AI

Zvi5 Feb 2025 22:10 UTC

87 points

20 comments20 min readLW link

(thezvi.wordpress.com)

Wired on: “DOGE personnel with admin access to Federal Payment System”

Raemon5 Feb 2025 21:32 UTC

88 points

45 comments2 min readLW link

(web.archive.org)

On AI Scaling

harsimony5 Feb 2025 20:24 UTC

6 points

3 comments8 min readLW link

(splittinginfinity.substack.com)

The State of Metaculus

ChristianWilliams5 Feb 2025 19:17 UTC

21 points

0 comments6 min readLW link

(www.metaculus.com)

Post-hoc reasoning in chain of thought

Kyle Cox5 Feb 2025 18:58 UTC

19 points

0 comments11 min readLW link

DeepSeek-R1 for Beginners

Anton Razzhigaev5 Feb 2025 18:58 UTC

13 points

0 comments8 min readLW link

Making the case for average-case AI Control

Nathaniel Mitrani5 Feb 2025 18:56 UTC

5 points

0 comments5 min readLW link

[Question] Alignment Paradox and a Request for Harsh Criticism

Bridgett Kay5 Feb 2025 18:17 UTC

6 points

7 comments1 min readLW link

Introducing International AI Governance Alliance (IAIGA)

jamesnorris5 Feb 2025 16:02 UTC

7 points

0 comments1 min readLW link

Introducing Collective Action for Existential Safety: 80+ actions individuals, organizations, and nations can take to improve our existential safety

jamesnorris5 Feb 2025 16:02 UTC

−9 points

2 comments1 min readLW link

Language Models Use Trigonometry to Do Addition

Subhash Kantamneni5 Feb 2025 13:50 UTC

80 points

1 comment10 min readLW link

Deploying the Observer will save humanity from existential threats

Aram Panasenco5 Feb 2025 10:39 UTC

−11 points

8 comments1 min readLW link

The Domain of Orthogonality

mgfcatherall5 Feb 2025 8:14 UTC

1 point

0 comments7 min readLW link

Reviewing LessWrong: Screwtape’s Basic Answer

Screwtape5 Feb 2025 4:30 UTC

97 points

18 comments6 min readLW link

[Question] Why isn’t AI containment the primary AI safety strategy?

Oliver Kuperman5 Feb 2025 3:54 UTC

1 point

3 comments3 min readLW link

Nick Land: Orthogonality

lumpenspace4 Feb 2025 21:07 UTC

5 points

37 comments8 min readLW link

What working on AI safety taught me about B2B SaaS sales

purple fire4 Feb 2025 20:50 UTC

7 points

12 comments5 min readLW link

Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

Daniel Herrmann, Aydin Mohseni and ben_levinstein

4 Feb 2025 20:34 UTC

45 points

22 comments5 min readLW link

Anti-Slop Interventions?

abramdemski4 Feb 2025 19:50 UTC

76 points

33 comments6 min readLW link

Can Persuasion Break AI Safety? Exploring the Interplay Between Fine-Tuning, Attacks, and Guardrails

Devina Jain4 Feb 2025 19:10 UTC

9 points

0 comments10 min readLW link

[Question] Journalism student looking for sources

pinkerton4 Feb 2025 18:58 UTC

11 points

3 comments1 min readLW link

We’re in Deep Research

Zvi4 Feb 2025 17:20 UTC

45 points

3 comments20 min readLW link

(thezvi.wordpress.com)

The Capitalist Agent

henophilia4 Feb 2025 15:32 UTC

1 point

10 comments3 min readLW link

(blog.hermesloom.org)

Forecasting AGI: Insights from Prediction Markets and Metaculus

Alvin Ånestrand4 Feb 2025 13:03 UTC

13 points

0 comments4 min readLW link

(forecastingaifutures.substack.com)

Ruling Out Lookup Tables

Alfred Harwood4 Feb 2025 10:39 UTC

22 points

11 comments7 min readLW link

Half-baked idea: a straightforward method for learning environmental goals?

Q Home4 Feb 2025 6:56 UTC

16 points

7 comments5 min readLW link

Information Versus Action

Screwtape4 Feb 2025 5:13 UTC

31 points

0 comments6 min readLW link

Utilitarian AI Alignment: Building a Moral Assistant with the Constitutional AI Method

Clément L4 Feb 2025 4:15 UTC

6 points

1 comment13 min readLW link

Tear Down the Burren

jefftk4 Feb 2025 3:40 UTC

45 points

2 comments2 min readLW link

(www.jefftk.com)

Constitutional Classifiers: Defending against universal jailbreaks (Anthropic Blog)

Archimedes4 Feb 2025 2:55 UTC

17 points

1 comment1 min readLW link

(www.anthropic.com)

Can someone, anyone, make superintelligence a more concrete concept?

Ori Nagel4 Feb 2025 2:18 UTC

2 points

8 comments5 min readLW link

What are the “no free lunch” theorems?

Vishakha and Algon

4 Feb 2025 2:02 UTC

19 points

4 comments1 min readLW link

(aisafety.info)

eliminating bias through language?

KvmanThinking4 Feb 2025 1:52 UTC

1 point

12 comments1 min readLW link

New Foresight Longevity Bio & Molecular Nano Grants Program

Allison Duettmann4 Feb 2025 0:28 UTC

12 points

0 comments1 min readLW link

Meta: Frontier AI Framework

Zach Stein-Perlman3 Feb 2025 22:00 UTC

33 points

2 comments1 min readLW link

(ai.meta.com)