All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 262728

Why Can’t We Hypothesize After the Fact?

David Udell26 Feb 2025 22:41 UTC

40 points

3 comments2 min readLW link

“AI Rapidly Gets Smarter, And Makes Some of Us Dumber,” from Sabine Hossenfelder

Evan_Gaensbauer26 Feb 2025 22:33 UTC

4 points

9 comments2 min readLW link

(youtu.be)

METR: AI models can be dangerous before public deployment

UnofficialLinkpostBot26 Feb 2025 20:19 UTC

16 points

0 comments3 min readLW link

(metr.org)

Representation Engineering has Its Problems, but None Seem Unsolvable

Lukasz G Bartoszcze26 Feb 2025 19:53 UTC

15 points

1 comment3 min readLW link

Thoughts that prompt good forecasts: A survey

Daniel_Friedrich26 Feb 2025 18:36 UTC

1 point

0 comments1 min readLW link

The non-tribal tribes

PatrickDFarley26 Feb 2025 17:22 UTC

24 points

4 comments16 min readLW link

SAE Training Dataset Influence in Feature Matching and a Hypothesis on Position Features

Seonglae Cho26 Feb 2025 17:05 UTC

4 points

3 comments17 min readLW link

Fuzzing LLMs sometimes makes them reveal their secrets

Fabien Roger26 Feb 2025 16:48 UTC

65 points

13 comments9 min readLW link

You can just wear a suit

lsusr26 Feb 2025 14:57 UTC

139 points

59 comments2 min readLW link

Matthew Yglesias—Misinformation Mostly Confuses Your Own Side

Siebe26 Feb 2025 14:55 UTC

10 points

1 comment1 min readLW link

(www.slowboring.com)

Optimizing Feedback to Learn Faster

Towards_Keeperhood26 Feb 2025 14:24 UTC

12 points

0 comments2 min readLW link

outlining is a historically recent underutilized gift to family

daijin26 Feb 2025 13:58 UTC

4 points

2 comments3 min readLW link

Osaka

lsusr26 Feb 2025 13:50 UTC

78 points

13 comments1 min readLW link

Time to Welcome Claude 3.7

Zvi26 Feb 2025 13:00 UTC

49 points

2 comments24 min readLW link

(thezvi.wordpress.com)

[PAPER] Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations

Lucy Farnik26 Feb 2025 12:50 UTC

85 points

8 comments7 min readLW link

Minor interpretability exploration #1: Grokking of modular addition, subtraction, multiplication, for different activation functions

Rareș Baron26 Feb 2025 11:35 UTC

5 points

13 comments4 min readLW link

[Question] Name for Standard AI Caveat?

yrimon26 Feb 2025 7:07 UTC

6 points

5 comments1 min readLW link

Levels of analysis for thinking about agency

Cole Wyeth26 Feb 2025 4:24 UTC

11 points

0 comments7 min readLW link

The Stag Hunt—cultivating cooperation to reap rewards

James Stephen Brown25 Feb 2025 23:45 UTC

7 points

0 comments4 min readLW link

(nonzerosum.games)

Three Levels for Large Language Model Cognition

Eleni Angelou25 Feb 2025 23:14 UTC

21 points

0 comments5 min readLW link

[Crosspost] Strategic wealth accumulation under transformative AI expectations

arden446 and CalebMaresca

25 Feb 2025 21:50 UTC

5 points

0 comments17 min readLW link

(forum.effectivealtruism.org)

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Jan Betley and Owain_Evans

25 Feb 2025 17:39 UTC

334 points

92 comments4 min readLW link

We Can Build Compassionate AI

Gordon Seidoh Worley25 Feb 2025 16:37 UTC

9 points

6 comments4 min readLW link

(uncertainupdates.substack.com)

[Question] Intellectual lifehacks repo

Antoine de Scorraille25 Feb 2025 16:32 UTC

11 points

16 comments1 min readLW link

Economics Roundup #5

Zvi25 Feb 2025 13:40 UTC

27 points

10 comments20 min readLW link

(thezvi.wordpress.com)

Making alignment a law of the universe

Richard Juggins25 Feb 2025 10:44 UTC

6 points

3 comments15 min readLW link

Revisiting Conway’s Law

annebrandes25 Feb 2025 8:33 UTC

13 points

4 comments2 min readLW link

Demystifying the Pinocchio Paradox

Novak Zukowski25 Feb 2025 6:16 UTC

−1 points

0 comments3 min readLW link

Technical comparison of Deepseek, Novasky, S1, Helix, P0

Juliezhanggg25 Feb 2025 4:20 UTC

8 points

0 comments5 min readLW link

Upcoming Protest for AI Safety

Matt Vincent25 Feb 2025 3:04 UTC

12 points

0 comments1 min readLW link

(www.pauseai-us.org)

what an efficient market feels from inside

DMMF25 Feb 2025 2:38 UTC

41 points

9 comments6 min readLW link

(danfrank.ca)

Metacompilation

Donald Hobson24 Feb 2025 22:58 UTC

11 points

1 comment4 min readLW link

The manifest manifesto

dkl924 Feb 2025 22:13 UTC

6 points

2 comments2 min readLW link

(dkl9.net)

Credit Suisse collapse obfuscated Parreaux, Thiébaud & Partners scandal

pocock24 Feb 2025 21:28 UTC

3 points

0 comments1 min readLW link

(juristgate.com)

Topological Data Analysis and Mechanistic Interpretability

Gunnar Carlsson24 Feb 2025 19:56 UTC

16 points

4 comments7 min readLW link

Zizian comparisons / connections in the open source & Linux communities

pocock24 Feb 2025 19:55 UTC

−17 points

0 comments1 min readLW link

Local Trust

ben_levinstein, Daniel Herrmann and Aydin Mohseni

24 Feb 2025 19:53 UTC

21 points

4 comments5 min readLW link

Nationwide Action Workshop: Contact Congress about AI safety!

Felix De Simone24 Feb 2025 19:36 UTC

7 points

0 comments1 min readLW link

Anthropic releases Claude 3.7 Sonnet with extended thinking mode

LawrenceC24 Feb 2025 19:32 UTC

88 points

8 comments4 min readLW link

(www.anthropic.com)

Training AI to do alignment research we don’t already know how to do

joshc24 Feb 2025 19:19 UTC

45 points

24 comments7 min readLW link

Conference Report: Threshold 2030 - Modeling AI Economic Futures

Deric Cheng, Justin Bullock, Deger Turan and Elliot Mckernon

24 Feb 2025 18:56 UTC

52 points

0 comments10 min readLW link

(www.convergenceanalysis.org)

Evaluating “What 2026 Looks Like” So Far

Jonny Spicer24 Feb 2025 18:55 UTC

78 points

7 comments7 min readLW link

Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?

Yoshua Bengio, Jesse Richardson, dwk and mattmacdermott

24 Feb 2025 18:31 UTC

45 points

15 comments11 min readLW link

Understanding Agent Preferences

martinkunev24 Feb 2025 17:46 UTC

6 points

2 comments14 min readLW link

What We Can Do to Prevent Extinction by AI

Joe Rogero24 Feb 2025 17:15 UTC

12 points

0 comments11 min readLW link

Dream, Truth, & Good

abramdemski24 Feb 2025 16:59 UTC

50 points

11 comments4 min readLW link

Forecasting Frontier Language Model Agent Capabilities

fidgetsinner, Axel Højmark, Jérémy Scheurer and Marius Hobbhahn

24 Feb 2025 16:51 UTC

35 points

0 comments5 min readLW link

(www.apolloresearch.ai)

A City Within a City

Declan Molony24 Feb 2025 15:51 UTC

62 points

1 comment7 min readLW link

Grok Grok

Zvi24 Feb 2025 14:20 UTC

36 points

2 comments19 min readLW link

(thezvi.wordpress.com)

if you’re not happy single, you won’t be happy immortal

daijin24 Feb 2025 13:23 UTC

2 points

1 comment1 min readLW link