All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 252627 28

The Stag Hunt—cultivating cooperation to reap rewards

James Stephen Brown25 Feb 2025 23:45 UTC

7 points

0 comments4 min readLW link

(nonzerosum.games)

Three Levels for Large Language Model Cognition

Eleni Angelou25 Feb 2025 23:14 UTC

21 points

0 comments5 min readLW link

[Crosspost] Strategic wealth accumulation under transformative AI expectations

arden446 and CalebMaresca

25 Feb 2025 21:50 UTC

5 points

0 comments17 min readLW link

(forum.effectivealtruism.org)

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Jan Betley and Owain_Evans

25 Feb 2025 17:39 UTC

335 points

92 comments4 min readLW link

We Can Build Compassionate AI

Gordon Seidoh Worley25 Feb 2025 16:37 UTC

9 points

6 comments4 min readLW link

(uncertainupdates.substack.com)

[Question] Intellectual lifehacks repo

Antoine de Scorraille25 Feb 2025 16:32 UTC

11 points

16 comments1 min readLW link

Economics Roundup #5

Zvi25 Feb 2025 13:40 UTC

27 points

10 comments20 min readLW link

(thezvi.wordpress.com)

Making alignment a law of the universe

Richard Juggins25 Feb 2025 10:44 UTC

6 points

3 comments15 min readLW link

Revisiting Conway’s Law

annebrandes25 Feb 2025 8:33 UTC

13 points

4 comments2 min readLW link

Demystifying the Pinocchio Paradox

Novak Zukowski25 Feb 2025 6:16 UTC

−1 points

0 comments3 min readLW link

Technical comparison of Deepseek, Novasky, S1, Helix, P0

Juliezhanggg25 Feb 2025 4:20 UTC

8 points

0 comments5 min readLW link

Upcoming Protest for AI Safety

Matt Vincent25 Feb 2025 3:04 UTC

12 points

0 comments1 min readLW link

(www.pauseai-us.org)

what an efficient market feels from inside

DMMF25 Feb 2025 2:38 UTC

41 points

9 comments6 min readLW link

(danfrank.ca)

Metacompilation

Donald Hobson24 Feb 2025 22:58 UTC

11 points

1 comment4 min readLW link

The manifest manifesto

dkl924 Feb 2025 22:13 UTC

6 points

2 comments2 min readLW link

(dkl9.net)

Credit Suisse collapse obfuscated Parreaux, Thiébaud & Partners scandal

pocock24 Feb 2025 21:28 UTC

3 points

0 comments1 min readLW link

(juristgate.com)

Topological Data Analysis and Mechanistic Interpretability

Gunnar Carlsson24 Feb 2025 19:56 UTC

17 points

4 comments7 min readLW link

Zizian comparisons / connections in the open source & Linux communities

pocock24 Feb 2025 19:55 UTC

−17 points

0 comments1 min readLW link

Local Trust

ben_levinstein, Daniel Herrmann and Aydin Mohseni

24 Feb 2025 19:53 UTC

21 points

4 comments5 min readLW link

Nationwide Action Workshop: Contact Congress about AI safety!

Felix De Simone24 Feb 2025 19:36 UTC

7 points

0 comments1 min readLW link

Anthropic releases Claude 3.7 Sonnet with extended thinking mode

LawrenceC24 Feb 2025 19:32 UTC

88 points

8 comments4 min readLW link

(www.anthropic.com)

Training AI to do alignment research we don’t already know how to do

joshc24 Feb 2025 19:19 UTC

45 points

24 comments7 min readLW link

Conference Report: Threshold 2030 - Modeling AI Economic Futures

Deric Cheng, Justin Bullock, Deger Turan and Elliot Mckernon

24 Feb 2025 18:56 UTC

52 points

0 comments10 min readLW link

(www.convergenceanalysis.org)

Evaluating “What 2026 Looks Like” So Far

Jonny Spicer24 Feb 2025 18:55 UTC

79 points

7 comments7 min readLW link

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

Yoshua Bengio, Jesse Richardson, dwk and mattmacdermott

24 Feb 2025 18:31 UTC

45 points

15 comments11 min readLW link

Understanding Agent Preferences

martinkunev24 Feb 2025 17:46 UTC

6 points

2 comments14 min readLW link

What We Can Do to Prevent Extinction by AI

Joe Rogero24 Feb 2025 17:15 UTC

13 points

0 comments11 min readLW link

Dream, Truth, & Good

abramdemski24 Feb 2025 16:59 UTC

50 points

11 comments4 min readLW link

Forecasting Frontier Language Model Agent Capabilities

fidgetsinner, Axel Højmark, Jérémy Scheurer and Marius Hobbhahn

24 Feb 2025 16:51 UTC

35 points

0 comments5 min readLW link

(www.apolloresearch.ai)

A City Within a City

Declan Molony24 Feb 2025 15:51 UTC

67 points

2 comments7 min readLW link

Grok Grok

Zvi24 Feb 2025 14:20 UTC

36 points

2 comments19 min readLW link

(thezvi.wordpress.com)

if you’re not happy single, you won’t be happy immortal

daijin24 Feb 2025 13:23 UTC

2 points

1 comment1 min readLW link

[NSFW] The Fuzzy Handcuffs of Liberation

lsusr24 Feb 2025 13:05 UTC

24 points

11 comments2 min readLW link

Dayton, Ohio, HPMOR 10 year Anniversary meetup

Lunawarrior24 Feb 2025 12:55 UTC

1 point

0 comments1 min readLW link

An Alternate History of the Future, 2025-2040

Mr Beastly24 Feb 2025 5:53 UTC

5 points

11 comments10 min readLW link

Export Surplusses

lsusr24 Feb 2025 5:53 UTC

29 points

21 comments3 min readLW link

AI alignment for mental health supports

hiki_t24 Feb 2025 4:21 UTC

1 point

1 comment1 min readLW link

The GDM AGI Safety+Alignment Team is Hiring for Applied Interpretability Research

Arthur Conmy and Neel Nanda

24 Feb 2025 2:17 UTC

48 points

1 comment7 min readLW link

Poll on AI opinions.

Niclas Kupper23 Feb 2025 22:39 UTC

1 point

2 comments1 min readLW link

The Geometry of Linear Regression versus PCA

criticalpoints23 Feb 2025 21:01 UTC

20 points

7 comments6 min readLW link

(eregis.github.io)

Judgements: Merging Prediction & Evidence

abramdemski23 Feb 2025 19:35 UTC

109 points

9 comments6 min readLW link

Intelligence as Privilege Escalation

Cole Wyeth23 Feb 2025 19:31 UTC

29 points

2 comments5 min readLW link

[Question] Have LLMs Generated Novel Insights?

abramdemski and Cole Wyeth

23 Feb 2025 18:22 UTC

171 points

45 comments2 min readLW link

The case for corporal punishment

Yair Halberstadt23 Feb 2025 15:05 UTC

28 points

5 comments2 min readLW link

Reflections on the state of the race to superintelligence, February 2025

Mitchell_Porter23 Feb 2025 13:58 UTC

22 points

7 comments4 min readLW link

List of most interesting ideas I encountered in my life, ranked

Lucien23 Feb 2025 12:36 UTC

21 points

6 comments1 min readLW link

Test of the Bene Gesserit

lsusr23 Feb 2025 11:51 UTC

19 points

3 comments3 min readLW link

Moral gauge theory: A speculative suggestion for AI alignment

James Diacoumis23 Feb 2025 11:42 UTC

6 points

3 comments8 min readLW link

[Question] Does human (mis)alignment pose a significant and imminent existential threat?

jr23 Feb 2025 10:03 UTC

6 points

3 comments1 min readLW link

Deep sparse autoencoders yield interpretable features too

Armaan A. Abraham23 Feb 2025 5:46 UTC

31 points

8 comments8 min readLW link