All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8910 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

“Think it Faster” worksheet

Raemon8 Feb 2025 22:02 UTC

70 points

11 comments4 min readLW link

Seven sources of goals in LLM agents

Seth Herd8 Feb 2025 21:54 UTC

23 points

3 comments2 min readLW link

[Question] p(s-risks to contemporary humans)?

MattAlexander8 Feb 2025 21:19 UTC

6 points

5 comments6 min readLW link

Cross-Layer Feature Alignment and Steering in Large Language Model

dlaptev8 Feb 2025 20:18 UTC

9 points

0 comments6 min readLW link

Towards building blocks of ontologies

Daniel C, Alex_Altair, Dalcy, Alfred Harwood and JoseFaustino

8 Feb 2025 16:03 UTC

29 points

0 comments26 min readLW link

Can Knowledge Hurt You? The Dangers of Infohazards (and Exfohazards)

aggliu and Writer

8 Feb 2025 15:51 UTC

19 points

0 comments5 min readLW link

(www.youtube.com)

Distilling the Internal Model Principle

JoseFaustino8 Feb 2025 14:59 UTC

21 points

0 comments16 min readLW link

Knocking Down My AI Optimist Strawman

tailcalled8 Feb 2025 10:52 UTC

31 points

3 comments6 min readLW link

Preserving Epistemic Novelty in AI: Experiments, Insights, and the Case for Decentralized Collective Intelligence

Andy E Williams8 Feb 2025 10:25 UTC

−4 points

8 comments7 min readLW link

Chaos Investments v0.31

Screwtape8 Feb 2025 6:53 UTC

19 points

1 comment9 min readLW link

AI Safety Oversights

Davey Morse8 Feb 2025 6:15 UTC

3 points

0 comments1 min readLW link

Wiki on Suspects in Lind, Zajko, and Maland Killings

Rebecca_Records8 Feb 2025 4:16 UTC

20 points

4 comments1 min readLW link

Research directions Open Phil wants to fund in technical AI safety

jake_mendel, maxnadeau and Peter Favaloro

8 Feb 2025 1:40 UTC

96 points

21 comments58 min readLW link

(www.openphilanthropy.org)

So You Want To Make Marginal Progress...

johnswentworth7 Feb 2025 23:22 UTC

311 points

42 comments4 min readLW link

Reasons-based choice and cluelessness

JesseClifton7 Feb 2025 22:21 UTC

35 points

0 comments10 min readLW link

[Translation] In the Age of AI don’t Look for Unicorns

mushroomsoup7 Feb 2025 21:06 UTC

3 points

0 comments10 min readLW link

Racing Towards Fusion and AI

Jeffrey Heninger7 Feb 2025 20:40 UTC

49 points

11 comments7 min readLW link

‘High-Level Machine Intelligence’ and ‘Full Automation of Labor’ in the AI Impacts Surveys

Jeffrey Heninger7 Feb 2025 20:40 UTC

11 points

1 comment7 min readLW link

Request for Information for a new US AI Action Plan (OSTP RFI)

agucova7 Feb 2025 20:40 UTC

5 points

0 comments2 min readLW link

(www.federalregister.gov)

A Problem to Solve Before Building a Deception Detector

Eleni Angelou and lewis smith

7 Feb 2025 19:35 UTC

78 points

12 comments14 min readLW link

Request for proposals: improving capability evaluations

cb7 Feb 2025 18:51 UTC

1 point

0 comments1 min readLW link

(www.openphilanthropy.org)

How AI Takeover Might Happen in 2 Years

joshc7 Feb 2025 17:10 UTC

431 points

142 comments29 min readLW link

(x.com)

the devil’s ontology

lostinwilliamsburg7 Feb 2025 14:18 UTC

−1 points

14 comments6 min readLW link

On the Meta and DeepMind Safety Frameworks

Zvi7 Feb 2025 13:10 UTC

45 points

1 comment17 min readLW link

(thezvi.wordpress.com)

Illusory Safety: Redteaming DeepSeek R1 and the Strongest Fine-Tunable Models of OpenAI, Anthropic, and Google

ChengCheng, Brendan Murphy, Adrià Garriga-alonso, Yashvardhan Sharma, dsbowen, smallsilo, Yawen Duan, ChrisCundy, Hannah Betts, AdamGleave and Kellin Pelrine

7 Feb 2025 3:57 UTC

37 points

0 comments10 min readLW link

When you downvote, explain why

KvmanThinking7 Feb 2025 1:03 UTC

7 points

31 comments1 min readLW link

Medical Windfall Prizes

PeterMcCluskey6 Feb 2025 23:33 UTC

5 points

1 comment5 min readLW link

(bayesianinvestor.com)

Do No Harm? Navigating and Nudging AI Moral Choices

Sinem, pandelis and Adam Newgas

6 Feb 2025 19:18 UTC

11 points

0 comments9 min readLW link

Open Philanthropy Technical AI Safety RFP - $40M Available Across 21 Research Areas

jake_mendel, maxnadeau and Peter Favaloro

6 Feb 2025 18:58 UTC

111 points

0 comments1 min readLW link

(www.openphilanthropy.org)

AISN #47: Reasoning Models

Corin Katzke and Dan H

6 Feb 2025 18:52 UTC

3 points

0 comments4 min readLW link

(newsletter.safe.ai)

Wild Animal Suffering Is The Worst Thing In The World

Bentham's Bulldog6 Feb 2025 16:15 UTC

27 points

18 comments7 min readLW link

Detecting Strategic Deception Using Linear Probes

Nicholas Goldowsky-Dill, bilalchughtai, StefanHex and Marius Hobbhahn

6 Feb 2025 15:46 UTC

104 points

9 comments2 min readLW link

(arxiv.org)

AI #102: Made in America

Zvi6 Feb 2025 14:20 UTC

26 points

18 comments67 min readLW link

(thezvi.wordpress.com)

Biology, Ideology and Violence

Zero Contradictions6 Feb 2025 11:26 UTC

−3 points

5 comments2 min readLW link

(thewaywardaxolotl.blogspot.com)

MATS Applications + Research Directions I’m Currently Excited About

Neel Nanda6 Feb 2025 11:03 UTC

73 points

7 comments8 min readLW link

Don’t go bankrupt, don’t go rogue

Nathan Young6 Feb 2025 10:31 UTC

20 points

1 comment7 min readLW link

Voting Results for the 2023 Review

Raemon6 Feb 2025 8:00 UTC

88 points

3 comments69 min readLW link

Chicanery: No

Screwtape6 Feb 2025 5:42 UTC

36 points

11 comments5 min readLW link

[Question] hypnosis question

KvmanThinking6 Feb 2025 2:41 UTC

3 points

5 comments1 min readLW link

BIDA Calendar iCal Feed

jefftk6 Feb 2025 1:30 UTC

10 points

0 comments1 min readLW link

(www.jefftk.com)

C’mon guys, Deliberate Practice is Real

Raemon5 Feb 2025 22:33 UTC

102 points

25 comments9 min readLW link

The Risk of Gradual Disempowerment from AI

Zvi5 Feb 2025 22:10 UTC

87 points

20 comments20 min readLW link

(thezvi.wordpress.com)

Wired on: “DOGE personnel with admin access to Federal Payment System”

Raemon5 Feb 2025 21:32 UTC

88 points

45 comments2 min readLW link

(web.archive.org)

On AI Scaling

harsimony5 Feb 2025 20:24 UTC

6 points

3 comments8 min readLW link

(splittinginfinity.substack.com)

The State of Metaculus

ChristianWilliams5 Feb 2025 19:17 UTC

21 points

0 comments6 min readLW link

(www.metaculus.com)

Post-hoc reasoning in chain of thought

Kyle Cox5 Feb 2025 18:58 UTC

20 points

0 comments11 min readLW link

DeepSeek-R1 for Beginners

Anton Razzhigaev5 Feb 2025 18:58 UTC

13 points

0 comments8 min readLW link

Making the case for average-case AI Control

Nathaniel Mitrani5 Feb 2025 18:56 UTC

5 points

0 comments5 min readLW link

[Question] Alignment Paradox and a Request for Harsh Criticism

Bridgett Kay5 Feb 2025 18:17 UTC

6 points

7 comments1 min readLW link

Introducing International AI Governance Alliance (IAIGA)

jamesnorris5 Feb 2025 16:02 UTC

7 points

0 comments1 min readLW link