All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

AllJanFeb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 222324 25 26 27 28 29 30 31

Theory of Change for AI Safety Camp

Linda Linsefors22 Jan 2025 22:07 UTC

36 points

3 comments7 min readLW link

On DeepSeek’s r1

Zvi22 Jan 2025 19:50 UTC

55 points

2 comments35 min readLW link

(thezvi.wordpress.com)

Detect Goodhart and shut down

Jeremy Gillen22 Jan 2025 18:45 UTC

71 points

21 comments7 min readLW link

Recursive Self-Modeling as a Plausible Mechanism for Real-time Introspection in Current Language Models

rife22 Jan 2025 18:36 UTC

14 points

6 comments2 min readLW link

The Fundamental Circularity Theorem: Why Some Mathematical Behaviours Are Inherently Unprovable

Alister Munday22 Jan 2025 18:20 UTC

−11 points

2 comments4 min readLW link

The Dead Cradle Theory: Why Earth May Not Survive Humanity’s Expansion into Space

Nicholas Andresen22 Jan 2025 17:43 UTC

10 points

1 comment11 min readLW link

The Functionalist Case for Machine Consciousness: Evidence from Large Language Models

James Diacoumis22 Jan 2025 17:43 UTC

17 points

24 comments9 min readLW link

Mechanisms too simple for humans to design

Malmesbury22 Jan 2025 16:54 UTC

221 points

47 comments15 min readLW link

Training Data Attribution: Examining Its Adoption & Use Cases

Deric Cheng, Justin Bullock and David_Kristoffersson

22 Jan 2025 15:41 UTC

12 points

0 comments3 min readLW link

(www.convergenceanalysis.org)

Training Data Attribution (TDA): Examining Its Adoption & Use Cases

Deric Cheng, Justin Bullock and David_Kristoffersson

22 Jan 2025 15:40 UTC

16 points

0 comments3 min readLW link

(www.convergenceanalysis.org)

The Quantum Mars Teleporter: An Empirical Test Of Personal Identity Theories

avturchin22 Jan 2025 11:48 UTC

10 points

18 comments2 min readLW link

Bayesian Reasoning on Maps

Sjlver22 Jan 2025 10:45 UTC

4 points

0 comments4 min readLW link

(blog.purpureus.net)

Against blanket arguments against interpretability

Dmitry Vaintrob22 Jan 2025 9:46 UTC

54 points

4 comments7 min readLW link

The real political spectrum

Hzn22 Jan 2025 8:55 UTC

−14 points

0 comments1 min readLW link

Evolution and the Low Road to Nash

Aydin Mohseni, ben_levinstein and Daniel Herrmann

22 Jan 2025 7:06 UTC

46 points

2 comments10 min readLW link

The Human Alignment Problem for AIs

rife22 Jan 2025 4:06 UTC

12 points

5 comments3 min readLW link

When does capability elicitation bound risk?

joshc22 Jan 2025 3:42 UTC

25 points

0 comments17 min readLW link

(redwoodresearch.substack.com)

[Question] Popular materials about environmental goals/agent foundations? People wanting to discuss such topics?

Q Home22 Jan 2025 3:30 UTC

5 points

0 comments1 min readLW link

Kitchen Air Purifier Comparison

jefftk22 Jan 2025 3:20 UTC

35 points

2 comments3 min readLW link

(www.jefftk.com)

November-December 2024 Progress in Guaranteed Safe AI

Quinn22 Jan 2025 1:20 UTC

17 points

0 comments4 min readLW link

(gsai.substack.com)

Quotes from the Stargate press conference

Nikola Jurkovic22 Jan 2025 0:50 UTC

149 points

7 comments1 min readLW link

(www.c-span.org)

Tell me about yourself: LLMs are aware of their learned behaviors

Martín Soto and Owain_Evans

22 Jan 2025 0:47 UTC

136 points

5 comments6 min readLW link

Training on Documents About Reward Hacking Induces Reward Hacking

evhub and Nathan Hu

21 Jan 2025 21:32 UTC

135 points

15 comments2 min readLW link

(alignment.anthropic.com)

Veo-2 Can Produce Realistic Ads

Logan Riggs21 Jan 2025 19:13 UTC

14 points

0 comments1 min readLW link

Computational Limits on Efficiency

vibhumeh21 Jan 2025 18:29 UTC

8 points

1 comment5 min readLW link

Democratizing AI Governance: Balancing Expertise and Public Participation

Lucile Ter-Minassian21 Jan 2025 18:29 UTC

2 points

0 comments15 min readLW link

Hitler was not a monster

halgir21 Jan 2025 18:21 UTC

−12 points

5 comments1 min readLW link

Natural Intelligence is Overhyped

Collisteru21 Jan 2025 18:09 UTC

15 points

0 comments7 min readLW link

14+ AI Safety Advisors You Can Speak to – New AISafety.com Resource

Bryce Robertson and Søren Elverlin

21 Jan 2025 17:34 UTC

24 points

0 comments1 min readLW link

[Linkpost] Why AI Safety Camp struggles with fundraising (FBB #2)

gergogaspar21 Jan 2025 17:27 UTC

3 points

0 comments1 min readLW link

The Manhattan Trap: Why a Race to Artificial Superintelligence is Self-Defeating

Corin Katzke and GideonF

21 Jan 2025 16:57 UTC

92 points

11 comments2 min readLW link

(www.convergenceanalysis.org)

Links and short notes, 2025-01-20

jasoncrawford21 Jan 2025 16:10 UTC

8 points

0 comments1 min readLW link

(newsletter.rootsofprogress.org)

The Case Against AI Control Research

johnswentworth21 Jan 2025 16:03 UTC

433 points

85 comments6 min readLW link

Will AI Resilience protect Developing Nations?

edgecase6421 Jan 2025 15:31 UTC

4 points

0 comments8 min readLW link

Sleep, Diet, Exercise and GLP-1 Drugs

Zvi21 Jan 2025 12:20 UTC

41 points

6 comments18 min readLW link

(thezvi.wordpress.com)

We don’t want to post again “This might be the last AI Safety Camp”

Remmelt, Linda Linsefors and Robert Kralisch

21 Jan 2025 12:03 UTC

36 points

17 comments1 min readLW link

(manifund.org)

On Responsibility

silentbob21 Jan 2025 10:47 UTC

15 points

2 comments6 min readLW link

The ‘anti woke’ are positioned to win but can they capitalize?

Hzn21 Jan 2025 9:52 UTC

−8 points

0 comments2 min readLW link

Almost all growth is exponential growth

lemonhope21 Jan 2025 7:16 UTC

41 points

7 comments1 min readLW link

Arbitrage Drains Worse Markets to Feeds Better Ones

Cedar21 Jan 2025 3:44 UTC

25 points

1 comment1 min readLW link

On Contact, Part 1

james.lucassen21 Jan 2025 3:10 UTC

14 points

1 comment11 min readLW link

Retrospective: 12 [sic] Months Since MIRI

james.lucassen21 Jan 2025 2:52 UTC

68 points

0 comments9 min readLW link

Easily Evaluate SAE-Steered Models with EleutherAI Evaluation Harness

Matthew Khoriaty21 Jan 2025 2:02 UTC

8 points

0 comments3 min readLW link

Why We Need More Shovel-Ready AI Notkilleveryoneism Megaproject Proposals

Peter Berggren20 Jan 2025 22:38 UTC

36 points

1 comment6 min readLW link

Tips and Code for Empirical Research Workflows

John Hughes and Ethan Perez

20 Jan 2025 22:31 UTC

110 points

17 comments20 min readLW link

Lecture Series on Tiling Agents #2

abramdemski20 Jan 2025 21:02 UTC

16 points

0 comments1 min readLW link

Announcement: Learning Theory Online Course

Yegreg and Alex Flint

20 Jan 2025 19:55 UTC

63 points

33 comments4 min readLW link

The Hidden Status Game in Hospital Slacking

EpistemicExplorer20 Jan 2025 18:35 UTC

2 points

4 comments3 min readLW link

Monthly Roundup #26: January 2025

Zvi20 Jan 2025 15:30 UTC

34 points

15 comments43 min readLW link

(thezvi.wordpress.com)

Things I have been using LLMs for

Kaj_Sotala20 Jan 2025 14:20 UTC

51 points

13 comments7 min readLW link

(kajsotala.fi)