All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

AllJanFeb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 3031

Outlaw Code

Commander Zander30 Jan 2025 23:41 UTC

10 points

1 comment2 min readLW link

Can someone, anyone, make superintelligence a more concrete concept?

Ori Nagel30 Jan 2025 23:25 UTC

3 points

6 comments4 min readLW link

Upcoming Neuroscience Workshop—Functionalizing Brain Data, Ground-Truthing, and the Role of Artificial Data in Advancing Neuroscience

Devin Ward30 Jan 2025 23:02 UTC

1 point

0 comments1 min readLW link

What’s Behind the SynBio Bust?

sarahconstantin30 Jan 2025 22:30 UTC

55 points

8 comments6 min readLW link

(sarahconstantin.substack.com)

The future of humanity is in management

jasoncrawford30 Jan 2025 22:14 UTC

4 points

5 comments13 min readLW link

(newsletter.rootsofprogress.org)

[Translation] AI Generated Fake News is Taking Over my Family Group Chat

mushroomsoup30 Jan 2025 20:24 UTC

3 points

0 comments6 min readLW link

A sketch of an AI control safety case

Tomek Korbak, joshc, Benjamin Hilton, Buck and Geoffrey Irving

30 Jan 2025 17:28 UTC

61 points

0 comments5 min readLW link

Gradual Disempowerment: Systemic Existential Risks from Incremental AI Development

Jan_Kulveit, Raymond Douglas, Nora_Ammann, Deger Turan, David Scott Krueger and David Duvenaud

30 Jan 2025 17:03 UTC

204 points

66 comments2 min readLW link

(gradual-disempowerment.ai)

[Question] Implication of Uncomputable Problems

Nathan112330 Jan 2025 16:48 UTC

−3 points

3 comments1 min readLW link

Hello World

Charlie Sanders30 Jan 2025 15:33 UTC

7 points

0 comments2 min readLW link

(www.dailymicrofiction.com)

Introducing the Coalition for a Baruch Plan for AI: A Call for a Radical Treaty-Making process for the Global Governance of AI

rguerreschi30 Jan 2025 15:26 UTC

11 points

0 comments2 min readLW link

AI #101: The Shallow End

Zvi30 Jan 2025 14:50 UTC

39 points

1 comment59 min readLW link

(thezvi.wordpress.com)

Memorization-generalization in practice

Dmitry Vaintrob30 Jan 2025 14:10 UTC

7 points

1 comment4 min readLW link

ARENA 5.0 - Call for Applicants

JamesH, James Fox, CallumMcDougall, Chloe Li and David Quarel

30 Jan 2025 13:18 UTC

35 points

2 comments6 min readLW link

You should read Hobbes, Locke, Hume, and Mill via EarlyModernTexts.com

Arjun Panickssery30 Jan 2025 12:35 UTC

54 points

3 comments3 min readLW link

(arjunpanickssery.substack.com)

[Question] Should you publish solutions to corrigibility?

rvnnt30 Jan 2025 11:52 UTC

13 points

13 comments1 min readLW link

Tetherware #1: The case for humanlike AI with free will

Jáchym Fibír30 Jan 2025 10:58 UTC

5 points

15 comments10 min readLW link

(tetherware.substack.com)

A High Level Closed-Door Session Discussing DeepSeek: Vision Trumps Technology

Cosmia_Nebula30 Jan 2025 9:53 UTC

30 points

1 comment8 min readLW link

(rentry.co)

Are we the Wolves now? Human Eugenics under AI Control

Brit30 Jan 2025 8:31 UTC

−1 points

2 comments2 min readLW link

[Question] Why not train reasoning models with RLHF?

Caleb Biddulph30 Jan 2025 7:58 UTC

4 points

4 comments1 min readLW link

The Road to Evil Is Paved with Good Objectives: Framework to Classify and Fix Misalignments.

Shivam30 Jan 2025 2:44 UTC

1 point

0 comments11 min readLW link

How exactly can AI take your job in the next few years?

Ansh Juneja30 Jan 2025 2:33 UTC

9 points

0 comments21 min readLW link

Absorbing Your Friends’ Powers

Alice Blair30 Jan 2025 2:32 UTC

8 points

1 comment2 min readLW link

Detailed Ideal World Benchmark

Knight Lee30 Jan 2025 2:31 UTC

5 points

2 comments2 min readLW link

Fertility Will Never Recover

Eneasz30 Jan 2025 1:16 UTC

35 points

39 comments2 min readLW link

(deathisbad.substack.com)

Predation as Payment for Criticism

Benquo30 Jan 2025 1:06 UTC

10 points

6 comments1 min readLW link

(benjaminrosshoffman.com)

Learn to Develop Your Advantage

ReverendBayes29 Jan 2025 22:06 UTC

16 points

1 comment5 min readLW link

Revealing alignment faking with a single prompt

Florian_Dietz29 Jan 2025 21:01 UTC

9 points

5 comments4 min readLW link

Allegory of the Tsunami

Evan Hu29 Jan 2025 19:09 UTC

4 points

1 comment3 min readLW link

My Mental Model of AI Optimist Opinions

tailcalled29 Jan 2025 18:44 UTC

14 points

7 comments1 min readLW link

Planning for Extreme AI Risks

joshc29 Jan 2025 18:33 UTC

143 points

5 comments16 min readLW link

Dario Amodei: On DeepSeek and Export Controls

Zach Stein-Perlman29 Jan 2025 17:15 UTC

53 points

3 comments1 min readLW link

(darioamodei.com)

Anthropic CEO calls for RSI

Andrea_Miotti29 Jan 2025 16:54 UTC

32 points

10 comments1 min readLW link

(darioamodei.com)

Efficiency spectra and “bucket of circuits” cartoons

Dmitry Vaintrob29 Jan 2025 15:06 UTC

20 points

0 comments7 min readLW link

DeepSeek: Lemon, It’s Wednesday

Zvi29 Jan 2025 15:00 UTC

33 points

0 comments33 min readLW link

(thezvi.wordpress.com)

How To Prevent a Dystopia

ank29 Jan 2025 14:16 UTC

−3 points

4 comments1 min readLW link

Whereby: The Zoom alternative you probably haven’t heard of

Itay Dreyfus29 Jan 2025 13:01 UTC

4 points

0 comments7 min readLW link

(productidentity.co)

[Question] Whose track record of AI predictions would you like to see evaluated?

Jonny Spicer29 Jan 2025 12:05 UTC

2 points

3 comments1 min readLW link

Paper: Open Problems in Mechanistic Interpretability

Lee Sharkey and bilalchughtai

29 Jan 2025 10:25 UTC

71 points

0 comments1 min readLW link

(arxiv.org)

Positive jailbreaks in LLMs

dereshev29 Jan 2025 8:41 UTC

6 points

0 comments4 min readLW link

Untrusted monitoring insights from watching ChatGPT play coordination games

jwfiredragon29 Jan 2025 4:53 UTC

14 points

8 comments9 min readLW link

The Game Board has been Flipped: Now is a good time to rethink what you’re doing

LintzA28 Jan 2025 23:36 UTC

118 points

30 comments13 min readLW link

Reconceptualizing the Nothingness and Existence

Htarlov28 Jan 2025 20:29 UTC

8 points

1 comment2 min readLW link

Fake thinking and real thinking

Joe Carlsmith28 Jan 2025 20:05 UTC

120 points

17 comments38 min readLW link

SAE regularization produces more interpretable models

Peter Lai and StefanHex

28 Jan 2025 20:02 UTC

21 points

7 comments4 min readLW link

Operator

Zvi28 Jan 2025 20:00 UTC

35 points

1 comment11 min readLW link

(thezvi.wordpress.com)

DeepSeek Panic at the App Store

Zvi28 Jan 2025 19:30 UTC

51 points

14 comments33 min readLW link

(thezvi.wordpress.com)

“Sharp Left Turn” discourse: An opinionated review

Steven Byrnes28 Jan 2025 18:47 UTC

228 points

31 comments31 min readLW link

Detecting out of distribution text with surprisal and entropy

Sandy Fraser28 Jan 2025 18:46 UTC

24 points

4 comments11 min readLW link

Should Art Carry the Weight of Shaping our Values?

Krishna Maneesha Dendukuri28 Jan 2025 18:43 UTC

2 points

0 comments3 min readLW link