All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 2728

OpenAI releases GPT-4.5

Seth Herd27 Feb 2025 21:40 UTC

34 points

12 comments3 min readLW link

(openai.com)

The Elicitation Game: Evaluating capability elicitation techniques

Teun van der Weij, Felix Hofstätter, JaydenTeoh, HenningB and Francis Rhys Ward

27 Feb 2025 20:33 UTC

15 points

1 comment2 min readLW link

For the Sake of Pleasure Alone

Greenless Mirror27 Feb 2025 20:07 UTC

−1 points

17 comments12 min readLW link

Keeping AI Subordinate to Human Thought: A Proposal for Public AI Conversations

syh27 Feb 2025 20:00 UTC

−1 points

0 comments1 min readLW link

(medium.com)

How to Corner Liars: A Miasma-Clearing Protocol

ymeskhout27 Feb 2025 17:18 UTC

70 points

24 comments7 min readLW link

(www.ymeskhout.com)

Economic Topology, ASI, and the Separation Equilibrium

mkualquiera27 Feb 2025 16:36 UTC

2 points

11 comments6 min readLW link

The Illusion of Iterative Improvement: Why AI (and Humans) Fail to Track Their Own Epistemic Drift

Andy E Williams27 Feb 2025 16:26 UTC

1 point

3 comments4 min readLW link

AI #105: Hey There Alexa

Zvi27 Feb 2025 14:30 UTC

31 points

3 comments40 min readLW link

(thezvi.wordpress.com)

Space-Faring Civilization density estimates and models—Review

Maxime Riché27 Feb 2025 11:44 UTC

20 points

0 comments12 min readLW link

Market Capitalization is Semantically Invalid

Zero Contradictions27 Feb 2025 11:27 UTC

3 points

14 comments3 min readLW link

(thewaywardaxolotl.blogspot.com)

Proposing Human Survival Strategy based on the NAIA Vision: Toward the Co-evolution of Diverse Intelligences

Hiroshi Yamakawa27 Feb 2025 5:18 UTC

0 points

0 comments11 min readLW link

Short & long term tradeoffs of strategic voting

kaleb27 Feb 2025 4:25 UTC

2 points

0 comments8 min readLW link

Recursive alignment with the principle of alignment

hive27 Feb 2025 2:34 UTC

13 points

4 comments15 min readLW link

(hiveism.substack.com)

Kingfisher Tour February 2025

jefftk27 Feb 2025 2:20 UTC

9 points

0 comments4 min readLW link

(www.jefftk.com)

You should use Consumer Reports

KvmanThinking27 Feb 2025 1:52 UTC

7 points

5 comments1 min readLW link

Universal AI Maximizes Variational Empowerment: New Insights into AGI Safety

Yusuke Hayashi27 Feb 2025 0:46 UTC

14 points

1 comment4 min readLW link

Why Can’t We Hypothesize After the Fact?

David Udell26 Feb 2025 22:41 UTC

40 points

3 comments2 min readLW link

“AI Rapidly Gets Smarter, And Makes Some of Us Dumber,” from Sabine Hossenfelder

Evan_Gaensbauer26 Feb 2025 22:33 UTC

4 points

9 comments2 min readLW link

(youtu.be)

METR: AI models can be dangerous before public deployment

UnofficialLinkpostBot26 Feb 2025 20:19 UTC

16 points

0 comments3 min readLW link

(metr.org)

Representation Engineering has Its Problems, but None Seem Unsolvable

Lukasz G Bartoszcze26 Feb 2025 19:53 UTC

15 points

1 comment3 min readLW link

Thoughts that prompt good forecasts: A survey

Daniel_Friedrich26 Feb 2025 18:36 UTC

1 point

0 comments1 min readLW link

The non-tribal tribes

PatrickDFarley26 Feb 2025 17:22 UTC

24 points

4 comments16 min readLW link

SAE Training Dataset Influence in Feature Matching and a Hypothesis on Position Features

Seonglae Cho26 Feb 2025 17:05 UTC

4 points

3 comments17 min readLW link

Fuzzing LLMs sometimes makes them reveal their secrets

Fabien Roger26 Feb 2025 16:48 UTC

66 points

13 comments9 min readLW link

You can just wear a suit

lsusr26 Feb 2025 14:57 UTC

165 points

64 comments2 min readLW link

Matthew Yglesias—Misinformation Mostly Confuses Your Own Side

Siebe26 Feb 2025 14:55 UTC

10 points

1 comment1 min readLW link

(www.slowboring.com)

Optimizing Feedback to Learn Faster

Towards_Keeperhood26 Feb 2025 14:24 UTC

12 points

0 comments2 min readLW link

outlining is a historically recent underutilized gift to family

daijin26 Feb 2025 13:58 UTC

4 points

2 comments3 min readLW link

Osaka

lsusr26 Feb 2025 13:50 UTC

80 points

13 comments1 min readLW link

Time to Welcome Claude 3.7

Zvi26 Feb 2025 13:00 UTC

49 points

2 comments24 min readLW link

(thezvi.wordpress.com)

[PAPER] Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations

Lucy Farnik26 Feb 2025 12:50 UTC

85 points

8 comments7 min readLW link

Minor interpretability exploration #1: Grokking of modular addition, subtraction, multiplication, for different activation functions

Rareș Baron26 Feb 2025 11:35 UTC

5 points

13 comments4 min readLW link

[Question] Name for Standard AI Caveat?

yrimon26 Feb 2025 7:07 UTC

6 points

5 comments1 min readLW link

Levels of analysis for thinking about agency

Cole Wyeth26 Feb 2025 4:24 UTC

11 points

0 comments7 min readLW link

The Stag Hunt—cultivating cooperation to reap rewards

James Stephen Brown25 Feb 2025 23:45 UTC

7 points

0 comments4 min readLW link

(nonzerosum.games)

Three Levels for Large Language Model Cognition

Eleni Angelou25 Feb 2025 23:14 UTC

21 points

0 comments5 min readLW link

[Crosspost] Strategic wealth accumulation under transformative AI expectations

arden446 and CalebMaresca

25 Feb 2025 21:50 UTC

5 points

0 comments17 min readLW link

(forum.effectivealtruism.org)

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Jan Betley and Owain_Evans

25 Feb 2025 17:39 UTC

335 points

92 comments4 min readLW link

We Can Build Compassionate AI

Gordon Seidoh Worley25 Feb 2025 16:37 UTC

9 points

6 comments4 min readLW link

(uncertainupdates.substack.com)

[Question] Intellectual lifehacks repo

Antoine de Scorraille25 Feb 2025 16:32 UTC

11 points

16 comments1 min readLW link

Economics Roundup #5

Zvi25 Feb 2025 13:40 UTC

27 points

10 comments20 min readLW link

(thezvi.wordpress.com)

Making alignment a law of the universe

Richard Juggins25 Feb 2025 10:44 UTC

6 points

3 comments15 min readLW link

Revisiting Conway’s Law

annebrandes25 Feb 2025 8:33 UTC

13 points

4 comments2 min readLW link

Demystifying the Pinocchio Paradox

Novak Zukowski25 Feb 2025 6:16 UTC

−1 points

0 comments3 min readLW link

Technical comparison of Deepseek, Novasky, S1, Helix, P0

Juliezhanggg25 Feb 2025 4:20 UTC

8 points

0 comments5 min readLW link

Upcoming Protest for AI Safety

Matt Vincent25 Feb 2025 3:04 UTC

12 points

0 comments1 min readLW link

(www.pauseai-us.org)

what an efficient market feels from inside

DMMF25 Feb 2025 2:38 UTC

41 points

9 comments6 min readLW link

(danfrank.ca)

Metacompilation

Donald Hobson24 Feb 2025 22:58 UTC

11 points

1 comment4 min readLW link

The manifest manifesto

dkl924 Feb 2025 22:13 UTC

6 points

2 comments2 min readLW link

(dkl9.net)

Credit Suisse collapse obfuscated Parreaux, Thiébaud & Partners scandal

pocock24 Feb 2025 21:28 UTC

3 points

0 comments1 min readLW link

(juristgate.com)