All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 212223 24 25 26 27 28 29 30 31

Making Sense of President Trump’s Annexation Obsession

Annapurna21 Mar 2025 21:10 UTC

−13 points

3 comments5 min readLW link

(jorgevelez.substack.com)

How I force LLMs to generate correct code

claudio21 Mar 2025 14:40 UTC

91 points

7 comments5 min readLW link

Prospects for Alignment Automation: Interpretability Case Study

Jacob Pfau and Geoffrey Irving

21 Mar 2025 14:05 UTC

32 points

5 comments8 min readLW link

Epoch AI released a GATE Scenario Explorer

Lee.aao21 Mar 2025 13:57 UTC

10 points

0 comments1 min readLW link

(epoch.ai)

They Took MY Job?

Zvi21 Mar 2025 13:30 UTC

37 points

4 comments9 min readLW link

(thezvi.wordpress.com)

Silly Time

jefftk21 Mar 2025 12:30 UTC

45 points

2 comments2 min readLW link

(www.jefftk.com)

Towards a scale-free theory of intelligent agency

Richard_Ngo21 Mar 2025 1:39 UTC

105 points

51 comments13 min readLW link

(www.mindthefuture.info)

[Question] Any mistakes in my understanding of Transformers?

Kallistos21 Mar 2025 0:34 UTC

3 points

7 comments1 min readLW link

A Critique of “Utility”

Zero Contradictions20 Mar 2025 23:21 UTC

−2 points

10 comments2 min readLW link

(thewaywardaxolotl.blogspot.com)

Intention to Treat

Alicorn20 Mar 2025 20:01 UTC

201 points

6 comments2 min readLW link

Anthropic: Progress from our Frontier Red Team

UnofficialLinkpostBot20 Mar 2025 19:12 UTC

16 points

3 comments6 min readLW link

(www.anthropic.com)

Everything’s An Emergency

Bentham's Bulldog20 Mar 2025 17:12 UTC

18 points

0 comments2 min readLW link

Non-Consensual Consent: The Performance of Choice in a Coercive World

Alex_Steiner20 Mar 2025 17:12 UTC

28 points

4 comments13 min readLW link

Minor interpretability exploration #4: LayerNorm and the learning coefficient

Rareș Baron20 Mar 2025 16:18 UTC

4 points

0 comments1 min readLW link

[Question] How far along Metr’s law can AI start automating or helping with alignment research?

Christopher King20 Mar 2025 15:58 UTC

20 points

21 comments1 min readLW link

Human alignment

Lucien20 Mar 2025 15:52 UTC

−16 points

2 comments1 min readLW link

[Question] Seeking: more Sci Fi micro reviews

Yair Halberstadt20 Mar 2025 14:31 UTC

7 points

0 comments1 min readLW link

AI #108: Straight Line on a Graph

Zvi20 Mar 2025 13:50 UTC

43 points

5 comments39 min readLW link

(thezvi.wordpress.com)

What is an alignment tax?

Vishakha and Algon

20 Mar 2025 13:06 UTC

5 points

0 comments1 min readLW link

(aisafety.info)

Longtermist Implications of the Existence Neutrality Hypothesis

Maxime Riché20 Mar 2025 12:20 UTC

3 points

2 comments21 min readLW link

You don’t have to be “into EA” to attend EAG(x) Conferences

gergogaspar20 Mar 2025 10:44 UTC

1 point

0 comments1 min readLW link

Defense Against The Super-Worms

viemccoy20 Mar 2025 7:24 UTC

24 points

1 comment2 min readLW link

Socially Graceful Degradation

Screwtape20 Mar 2025 4:03 UTC

58 points

10 comments9 min readLW link

Apply to MATS 8.0!

Ryan Kidd and K Richards

20 Mar 2025 2:17 UTC

64 points

5 comments4 min readLW link

Improved visualizations of METR Time Horizons paper.

LDJ19 Mar 2025 23:36 UTC

30 points

4 comments2 min readLW link

The case against “The case against AI alignment”

KvmanThinking19 Mar 2025 22:40 UTC

1 point

0 comments1 min readLW link

[Question] Superintelligence Strategy: A Pragmatic Path to… Doom?

Mr Beastly19 Mar 2025 22:30 UTC

8 points

0 comments3 min readLW link

SHIFT relies on token-level features to de-bias Bias in Bios probes

Tim Hua19 Mar 2025 21:29 UTC

39 points

2 comments6 min readLW link

Janet must die

Shmi19 Mar 2025 20:35 UTC

12 points

3 comments2 min readLW link

[Question] Why am I getting downvoted on Lesswrong?

Oxidize19 Mar 2025 18:32 UTC

7 points

14 comments1 min readLW link

Forecasting AI Futures Resource Hub

Alvin Ånestrand19 Mar 2025 17:26 UTC

2 points

0 comments2 min readLW link

(forecastingaifutures.substack.com)

TBC episode w Dave Kasten from Control AI on AI Policy

Eneasz19 Mar 2025 17:09 UTC

14 points

0 comments1 min readLW link

(www.thebayesianconspiracy.com)

Prioritizing threats for AI control

ryan_greenblatt19 Mar 2025 17:09 UTC

59 points

2 comments10 min readLW link

The Illusion of Transparency as a Trust-Building Mechanism

Priyanka Bharadwaj19 Mar 2025 17:09 UTC

2 points

0 comments1 min readLW link

How Do We Govern AI Well?

kaime19 Mar 2025 17:08 UTC

2 points

0 comments25 min readLW link

METR: Measuring AI Ability to Complete Long Tasks

Zach Stein-Perlman19 Mar 2025 16:00 UTC

242 points

106 comments5 min readLW link

(metr.org)

Why I think AI will go poorly for humanity

Alek Westover19 Mar 2025 15:52 UTC

14 points

0 comments30 min readLW link

The principle of genomic liberty

TsviBT19 Mar 2025 14:27 UTC

76 points

51 comments17 min readLW link

Going Nova

Zvi19 Mar 2025 13:30 UTC

69 points

27 comments15 min readLW link

(thezvi.wordpress.com)

Equations Mean Things

abstractapplic19 Mar 2025 8:16 UTC

56 points

10 comments3 min readLW link

Elite Coordination via the Consensus of Power

Richard_Ngo19 Mar 2025 6:56 UTC

92 points

15 comments12 min readLW link

(www.mindthefuture.info)

What I am working on right now and why: representation engineering edition

Lukasz G Bartoszcze18 Mar 2025 22:37 UTC

3 points

0 comments3 min readLW link

Boots theory and Sybil Ramkin

philh18 Mar 2025 22:10 UTC

37 points

18 comments11 min readLW link

(reasonableapproximation.net)

Schmidt Sciences Technical AI Safety RFP on Inference-Time Compute – Deadline: April 30

Ryan Gajarawala18 Mar 2025 18:05 UTC

18 points

0 comments2 min readLW link

(www.schmidtsciences.org)

PRISM: Perspective Reasoning for Integrated Synthesis and Mediation (Interactive Demo)

Anthony Diamond18 Mar 2025 18:03 UTC

10 points

2 comments1 min readLW link

Subspace Rerouting: Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models

Le magicien quantique18 Mar 2025 17:55 UTC

6 points

1 comment10 min readLW link

Progress links and short notes, 2025-03-18

jasoncrawford18 Mar 2025 17:14 UTC

8 points

0 comments3 min readLW link

(newsletter.rootsofprogress.org)

The Convergent Path to the Stars

Maxime Riché18 Mar 2025 17:09 UTC

6 points

0 comments20 min readLW link

Sapir-Whorf Ego Death

Jonathan Moregård18 Mar 2025 16:57 UTC

8 points

7 comments2 min readLW link

(honestliving.substack.com)

Smelling Nice is Good, Actually

Gordon Seidoh Worley18 Mar 2025 16:54 UTC

29 points

8 comments3 min readLW link

(uncertainupdates.substack.com)