All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202122 23 24 25 26 27 28 29 30 31

A Critique of “Utility”

Zero Contradictions20 Mar 2025 23:21 UTC

−2 points

10 comments2 min readLW link

(thewaywardaxolotl.blogspot.com)

Intention to Treat

Alicorn20 Mar 2025 20:01 UTC

201 points

6 comments2 min readLW link

Anthropic: Progress from our Frontier Red Team

UnofficialLinkpostBot20 Mar 2025 19:12 UTC

16 points

3 comments6 min readLW link

(www.anthropic.com)

Everything’s An Emergency

Bentham's Bulldog20 Mar 2025 17:12 UTC

18 points

0 comments2 min readLW link

Non-Consensual Consent: The Performance of Choice in a Coercive World

Alex_Steiner20 Mar 2025 17:12 UTC

28 points

4 comments13 min readLW link

Minor interpretability exploration #4: LayerNorm and the learning coefficient

Rareș Baron20 Mar 2025 16:18 UTC

4 points

0 comments1 min readLW link

[Question] How far along Metr’s law can AI start automating or helping with alignment research?

Christopher King20 Mar 2025 15:58 UTC

20 points

21 comments1 min readLW link

Human alignment

Lucien20 Mar 2025 15:52 UTC

−16 points

2 comments1 min readLW link

[Question] Seeking: more Sci Fi micro reviews

Yair Halberstadt20 Mar 2025 14:31 UTC

7 points

0 comments1 min readLW link

AI #108: Straight Line on a Graph

Zvi20 Mar 2025 13:50 UTC

43 points

5 comments39 min readLW link

(thezvi.wordpress.com)

What is an alignment tax?

Vishakha and Algon

20 Mar 2025 13:06 UTC

5 points

0 comments1 min readLW link

(aisafety.info)

Longtermist Implications of the Existence Neutrality Hypothesis

Maxime Riché20 Mar 2025 12:20 UTC

3 points

2 comments21 min readLW link

You don’t have to be “into EA” to attend EAG(x) Conferences

gergogaspar20 Mar 2025 10:44 UTC

1 point

0 comments1 min readLW link

Defense Against The Super-Worms

viemccoy20 Mar 2025 7:24 UTC

24 points

1 comment2 min readLW link

Socially Graceful Degradation

Screwtape20 Mar 2025 4:03 UTC

58 points

10 comments9 min readLW link

Apply to MATS 8.0!

Ryan Kidd and K Richards

20 Mar 2025 2:17 UTC

64 points

5 comments4 min readLW link

Improved visualizations of METR Time Horizons paper.

LDJ19 Mar 2025 23:36 UTC

30 points

4 comments2 min readLW link

The case against “The case against AI alignment”

KvmanThinking19 Mar 2025 22:40 UTC

1 point

0 comments1 min readLW link

[Question] Superintelligence Strategy: A Pragmatic Path to… Doom?

Mr Beastly19 Mar 2025 22:30 UTC

8 points

0 comments3 min readLW link

SHIFT relies on token-level features to de-bias Bias in Bios probes

Tim Hua19 Mar 2025 21:29 UTC

39 points

2 comments6 min readLW link

Janet must die

Shmi19 Mar 2025 20:35 UTC

12 points

3 comments2 min readLW link

[Question] Why am I getting downvoted on Lesswrong?

Oxidize19 Mar 2025 18:32 UTC

7 points

14 comments1 min readLW link

Forecasting AI Futures Resource Hub

Alvin Ånestrand19 Mar 2025 17:26 UTC

2 points

0 comments2 min readLW link

(forecastingaifutures.substack.com)

TBC episode w Dave Kasten from Control AI on AI Policy

Eneasz19 Mar 2025 17:09 UTC

14 points

0 comments1 min readLW link

(www.thebayesianconspiracy.com)

Prioritizing threats for AI control

ryan_greenblatt19 Mar 2025 17:09 UTC

59 points

2 comments10 min readLW link

The Illusion of Transparency as a Trust-Building Mechanism

Priyanka Bharadwaj19 Mar 2025 17:09 UTC

2 points

0 comments1 min readLW link

How Do We Govern AI Well?

kaime19 Mar 2025 17:08 UTC

2 points

0 comments25 min readLW link

METR: Measuring AI Ability to Complete Long Tasks

Zach Stein-Perlman19 Mar 2025 16:00 UTC

242 points

106 comments5 min readLW link

(metr.org)

Why I think AI will go poorly for humanity

Alek Westover19 Mar 2025 15:52 UTC

14 points

0 comments30 min readLW link

The principle of genomic liberty

TsviBT19 Mar 2025 14:27 UTC

76 points

51 comments17 min readLW link

Going Nova

Zvi19 Mar 2025 13:30 UTC

69 points

27 comments15 min readLW link

(thezvi.wordpress.com)

Equations Mean Things

abstractapplic19 Mar 2025 8:16 UTC

56 points

10 comments3 min readLW link

Elite Coordination via the Consensus of Power

Richard_Ngo19 Mar 2025 6:56 UTC

92 points

15 comments12 min readLW link

(www.mindthefuture.info)

What I am working on right now and why: representation engineering edition

Lukasz G Bartoszcze18 Mar 2025 22:37 UTC

3 points

0 comments3 min readLW link

Boots theory and Sybil Ramkin

philh18 Mar 2025 22:10 UTC

37 points

18 comments11 min readLW link

(reasonableapproximation.net)

Schmidt Sciences Technical AI Safety RFP on Inference-Time Compute – Deadline: April 30

Ryan Gajarawala18 Mar 2025 18:05 UTC

18 points

0 comments2 min readLW link

(www.schmidtsciences.org)

PRISM: Perspective Reasoning for Integrated Synthesis and Mediation (Interactive Demo)

Anthony Diamond18 Mar 2025 18:03 UTC

10 points

2 comments1 min readLW link

Subspace Rerouting: Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models

Le magicien quantique18 Mar 2025 17:55 UTC

6 points

1 comment10 min readLW link

Progress links and short notes, 2025-03-18

jasoncrawford18 Mar 2025 17:14 UTC

8 points

0 comments3 min readLW link

(newsletter.rootsofprogress.org)

The Convergent Path to the Stars

Maxime Riché18 Mar 2025 17:09 UTC

6 points

0 comments20 min readLW link

Sapir-Whorf Ego Death

Jonathan Moregård18 Mar 2025 16:57 UTC

8 points

7 comments2 min readLW link

(honestliving.substack.com)

Smelling Nice is Good, Actually

Gordon Seidoh Worley18 Mar 2025 16:54 UTC

29 points

8 comments3 min readLW link

(uncertainupdates.substack.com)

A Taxonomy of Jobs Deeply Resistant to TAI Automation

Deric Cheng18 Mar 2025 16:25 UTC

9 points

0 comments12 min readLW link

(www.convergenceanalysis.org)

Why Are The Human Sciences Hard? Two New Hypotheses

Aydin Mohseni, Daniel Herrmann and ben_levinstein

18 Mar 2025 15:45 UTC

39 points

14 comments9 min readLW link

Go home GPT-4o, you’re drunk: emergent misalignment as lowered inhibitions

Stuart_Armstrong and rgorman

18 Mar 2025 14:48 UTC

81 points

12 comments5 min readLW link

[Question] What is the theory of change behind writing papers about AI safety?

Kajus18 Mar 2025 12:51 UTC

7 points

1 comment1 min readLW link

OpenAI #11: America Action Plan

Zvi18 Mar 2025 12:50 UTC

83 points

3 comments6 min readLW link

(thezvi.wordpress.com)

I changed my mind about orca intelligence

Towards_Keeperhood18 Mar 2025 10:15 UTC

54 points

24 comments5 min readLW link

[Question] Is Peano arithmetic trying to kill us? Do we care?

Q Home18 Mar 2025 8:22 UTC

17 points

2 comments2 min readLW link

Do What the Mammals Do

CrimsonChin18 Mar 2025 3:57 UTC

2 points

6 comments4 min readLW link