All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan FebMarApr May Jun

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 222324 25 26 27 28 29 30 31

The Dangerous Illusion of AI Deterrence: Why MAIM Isn’t Rational

mc1softMar 22, 2025, 10:55 PM

3 points

0 comments2 min readLW link

Dayton, Ohio, ACX Meetup

LunawarriorMar 22, 2025, 7:45 PM

1 point

0 comments1 min readLW link

[Replication] Crosscoder-based Stage-Wise Model Diffing

Anna Soligo, Thomas Read, Oliver Clive-Griffin, dmanningcoe, Chun Hei Yip, rajashree and Jason Gross

Mar 22, 2025, 6:35 PM

21 points

0 comments7 min readLW link

The Principle of Satisfying Foreknowledge

Randall ReamsMar 22, 2025, 6:20 PM

1 point

0 comments2 min readLW link

[Question] Urgency in the ITN framework

ShaïmanMar 22, 2025, 6:16 PM

0 points

2 comments1 min readLW link

Transhumanism and AI: Toward Prosperity or Extinction?

ShaïmanMar 22, 2025, 6:16 PM

10 points

2 comments6 min readLW link

Tied Crosscoders: Explaining Chat Behavior from Base Model

Santiago AranguriMar 22, 2025, 6:07 PM

9 points

0 comments12 min readLW link

100+ concrete projects and open problems in evals

Marius HobbhahnMar 22, 2025, 3:21 PM

74 points

1 comment1 min readLW link

Do models say what they learn?

Andy Arditi, marvinli, Joe Benton and Miles Turpin

Mar 22, 2025, 3:19 PM

126 points

12 comments13 min readLW link

AGI Morality and Why It Is Unlikely to Emerge as a Feature of Superintelligence

funnyfrancoMar 22, 2025, 12:06 PM

1 point

9 comments18 min readLW link

2025 Q3 Pivotal Research Fellowship: Applications Open

Tobias HMar 22, 2025, 10:54 AM

4 points

0 comments2 min readLW link

Good Research Takes are Not Sufficient for Good Strategic Takes

Neel NandaMar 22, 2025, 10:13 AM

292 points

28 comments4 min readLW link

(www.neelnanda.io)

Grammatical Roles and Social Roles: A Structural Analogy

LucienMar 22, 2025, 7:44 AM

0 points

0 comments1 min readLW link

Legibility

lsusrMar 22, 2025, 6:54 AM

19 points

22 comments2 min readLW link

Why Were We Wrong About China and AI? A Case Study in Failed Rationality

thedudeabidesMar 22, 2025, 5:13 AM

31 points

45 comments1 min readLW link

A Short Diatribe on Hidden Assertions.

EggsMar 22, 2025, 3:14 AM

−9 points

2 comments3 min readLW link

Transformer Attention’s High School Math Mistake

Max MaMar 22, 2025, 12:16 AM

−13 points

1 comment1 min readLW link

Making Sense of President Trump’s Annexation Obsession

AnnapurnaMar 21, 2025, 9:10 PM

−13 points

3 comments5 min readLW link

(jorgevelez.substack.com)

How I force LLMs to generate correct code

claudioMar 21, 2025, 2:40 PM

91 points

7 comments5 min readLW link

Prospects for Alignment Automation: Interpretability Case Study

Jacob Pfau and Geoffrey Irving

Mar 21, 2025, 2:05 PM

32 points

5 comments8 min readLW link

Epoch AI released a GATE Scenario Explorer

Lee.aaoMar 21, 2025, 1:57 PM

10 points

0 comments1 min readLW link

(epoch.ai)

They Took MY Job?

ZviMar 21, 2025, 1:30 PM

37 points

4 comments9 min readLW link

(thezvi.wordpress.com)

Silly Time

jefftkMar 21, 2025, 12:30 PM

45 points

2 comments2 min readLW link

(www.jefftk.com)

Towards a scale-free theory of intelligent agency

Richard_NgoMar 21, 2025, 1:39 AM

96 points

44 comments13 min readLW link

(www.mindthefuture.info)

[Question] Any mistakes in my understanding of Transformers?

KallistosMar 21, 2025, 12:34 AM

3 points

7 comments1 min readLW link

A Critique of “Utility”

Zero ContradictionsMar 20, 2025, 11:21 PM

−2 points

10 comments2 min readLW link

(thewaywardaxolotl.blogspot.com)

Intention to Treat

AlicornMar 20, 2025, 8:01 PM

195 points

5 comments2 min readLW link

Anthropic: Progress from our Frontier Red Team

UnofficialLinkpostBotMar 20, 2025, 7:12 PM

16 points

3 comments6 min readLW link

(www.anthropic.com)

Everything’s An Emergency

Bentham's BulldogMar 20, 2025, 5:12 PM

18 points

0 comments2 min readLW link

Non-Consensual Consent: The Performance of Choice in a Coercive World

Alex_SteinerMar 20, 2025, 5:12 PM

27 points

4 comments13 min readLW link

Minor interpretability exploration #4: LayerNorm and the learning coefficient

Rareș BaronMar 20, 2025, 4:18 PM

2 points

0 comments1 min readLW link

[Question] How far along Metr’s law can AI start automating or helping with alignment research?

Christopher KingMar 20, 2025, 3:58 PM

20 points

21 comments1 min readLW link

Human alignment

LucienMar 20, 2025, 3:52 PM

−16 points

2 comments1 min readLW link

[Question] Seeking: more Sci Fi micro reviews

Yair HalberstadtMar 20, 2025, 2:31 PM

7 points

0 comments1 min readLW link

AI #108: Straight Line on a Graph

ZviMar 20, 2025, 1:50 PM

43 points

5 comments39 min readLW link

(thezvi.wordpress.com)

What is an alignment tax?

Vishakha and Algon

Mar 20, 2025, 1:06 PM

5 points

0 comments1 min readLW link

(aisafety.info)

Longtermist Implications of the Existence Neutrality Hypothesis

Maxime RichéMar 20, 2025, 12:20 PM

3 points

2 comments21 min readLW link

You don’t have to be “into EA” to attend EAG(x) Conferences

gergogasparMar 20, 2025, 10:44 AM

1 point

0 comments1 min readLW link

Defense Against The Super-Worms

viemccoyMar 20, 2025, 7:24 AM

23 points

1 comment2 min readLW link

Socially Graceful Degradation

ScrewtapeMar 20, 2025, 4:03 AM

57 points

9 comments9 min readLW link

Apply to MATS 8.0!

Ryan Kidd and K Richards

Mar 20, 2025, 2:17 AM

63 points

5 comments4 min readLW link

Improved visualizations of METR Time Horizons paper.

LDJMar 19, 2025, 11:36 PM

20 points

4 comments2 min readLW link

Is CCP authoritarianism good for building safe AI?

HrussMar 19, 2025, 11:13 PM

1 point

0 comments1 min readLW link

The case against “The case against AI alignment”

KvmanThinking19 Mar 2025 22:40 UTC

2 points

0 comments1 min readLW link

[Question] Superintelligence Strategy: A Pragmatic Path to… Doom?

Mr Beastly19 Mar 2025 22:30 UTC

6 points

0 comments3 min readLW link

SHIFT relies on token-level features to de-bias Bias in Bios probes

Tim Hua19 Mar 2025 21:29 UTC

39 points

2 comments6 min readLW link

Janet must die

Shmi19 Mar 2025 20:35 UTC

12 points

3 comments2 min readLW link

[Question] Why am I getting downvoted on Lesswrong?

Oxidize19 Mar 2025 18:32 UTC

7 points

14 comments1 min readLW link

Forecasting AI Futures Resource Hub

Alvin Ånestrand19 Mar 2025 17:26 UTC

2 points

0 comments2 min readLW link

(forecastingaifutures.substack.com)

TBC episode w Dave Kasten from Control AI on AI Policy

Eneasz19 Mar 2025 17:09 UTC

8 points

0 comments1 min readLW link

(www.thebayesianconspiracy.com)