All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 232425 26 27 28 29 30 31

We need (a lot) more rogue agent honeypots

Ozyrus23 Mar 2025 22:24 UTC

37 points

12 comments4 min readLW link

Probability Theory Fundamentals 102: Source of the Sample Space

Ape in the coat23 Mar 2025 17:23 UTC

12 points

17 comments7 min readLW link

How to mitigate sandbagging

Teun van der Weij23 Mar 2025 17:19 UTC

32 points

0 comments8 min readLW link

Solving willpower seems easier than solving aging

Yair Halberstadt23 Mar 2025 15:25 UTC

71 points

28 comments1 min readLW link

[Question] Should I fundraise for open source search engine?

samuelshadrach23 Mar 2025 13:04 UTC

−11 points

2 comments1 min readLW link

Privateers Reborn: Cyber Letters of Marque

arealsociety23 Mar 2025 3:39 UTC

5 points

2 comments1 min readLW link

(arealsociety.substack.com)

Beware nerfing AI with opinionated human-centric sensors

Haotian23 Mar 2025 1:09 UTC

1 point

0 comments3 min readLW link

Reframing AI Safety as a Neverending Institutional Challenge

scasper23 Mar 2025 0:13 UTC

53 points

12 comments5 min readLW link

The Dangerous Illusion of AI Deterrence: Why MAIM Isn’t Rational

Robert Shuler22 Mar 2025 22:55 UTC

3 points

0 comments2 min readLW link

Dayton, Ohio, ACX Meetup

Lunawarrior22 Mar 2025 19:45 UTC

1 point

0 comments1 min readLW link

[Replication] Crosscoder-based Stage-Wise Model Diffing

Anna Soligo, Thomas Read, Oliver Clive-Griffin, dmanningcoe, Chun Hei Yip, rajashree and Jason Gross

22 Mar 2025 18:35 UTC

25 points

0 comments7 min readLW link

The Principle of Satisfying Foreknowledge

Randall Reams22 Mar 2025 18:20 UTC

1 point

0 comments2 min readLW link

[Question] Urgency in the ITN framework

Shaïman22 Mar 2025 18:16 UTC

0 points

2 comments1 min readLW link

Transhumanism and AI: Toward Prosperity or Extinction?

Shaïman22 Mar 2025 18:16 UTC

11 points

2 comments6 min readLW link

Tied Crosscoders: Explaining Chat Behavior from Base Model

Santiago Aranguri22 Mar 2025 18:07 UTC

9 points

0 comments12 min readLW link

100+ concrete projects and open problems in evals

Marius Hobbhahn22 Mar 2025 15:21 UTC

75 points

1 comment1 min readLW link

Do models say what they learn?

Andy Arditi, marvinli, Joe Benton and Miles Turpin

22 Mar 2025 15:19 UTC

127 points

12 comments13 min readLW link

deleted

funnyfranco22 Mar 2025 12:06 UTC

2 points

8 comments1 min readLW link

2025 Q3 Pivotal Research Fellowship: Applications Open

Tobias H22 Mar 2025 10:54 UTC

4 points

0 comments2 min readLW link

Good Research Takes are Not Sufficient for Good Strategic Takes

Neel Nanda22 Mar 2025 10:13 UTC

297 points

28 comments4 min readLW link

(www.neelnanda.io)

Grammatical Roles and Social Roles: A Structural Analogy

Lucien22 Mar 2025 7:44 UTC

0 points

0 comments1 min readLW link

Legibility

lsusr22 Mar 2025 6:54 UTC

22 points

22 comments2 min readLW link

Why Were We Wrong About China and AI? A Case Study in Failed Rationality

[deleted-by-moderator]22 Mar 2025 5:13 UTC

28 points

47 comments1 min readLW link

A Short Diatribe on Hidden Assertions.

Eggs22 Mar 2025 3:14 UTC

−9 points

2 comments3 min readLW link

Transformer Attention’s High School Math Mistake

Max Ma22 Mar 2025 0:16 UTC

−13 points

1 comment1 min readLW link

Making Sense of President Trump’s Annexation Obsession

Annapurna21 Mar 2025 21:10 UTC

−13 points

3 comments5 min readLW link

(jorgevelez.substack.com)

How I force LLMs to generate correct code

claudio21 Mar 2025 14:40 UTC

91 points

7 comments5 min readLW link

Prospects for Alignment Automation: Interpretability Case Study

Jacob Pfau and Geoffrey Irving

21 Mar 2025 14:05 UTC

33 points

5 comments8 min readLW link

Epoch AI released a GATE Scenario Explorer

Lee.aao21 Mar 2025 13:57 UTC

10 points

0 comments1 min readLW link

(epoch.ai)

They Took MY Job?

Zvi21 Mar 2025 13:30 UTC

37 points

4 comments9 min readLW link

(thezvi.wordpress.com)

Silly Time

jefftk21 Mar 2025 12:30 UTC

45 points

2 comments2 min readLW link

(www.jefftk.com)

Towards a scale-free theory of intelligent agency

Richard_Ngo21 Mar 2025 1:39 UTC

111 points

52 comments13 min readLW link

(www.mindthefuture.info)

[Question] Any mistakes in my understanding of Transformers?

Kallistos21 Mar 2025 0:34 UTC

3 points

7 comments1 min readLW link

A Critique of “Utility”

Zero Contradictions20 Mar 2025 23:21 UTC

−2 points

10 comments2 min readLW link

(thewaywardaxolotl.blogspot.com)

Intention to Treat

Alicorn20 Mar 2025 20:01 UTC

214 points

7 comments2 min readLW link

Anthropic: Progress from our Frontier Red Team

UnofficialLinkpostBot20 Mar 2025 19:12 UTC

16 points

3 comments6 min readLW link

(www.anthropic.com)

Everything’s An Emergency

Bentham's Bulldog20 Mar 2025 17:12 UTC

18 points

0 comments2 min readLW link

Non-Consensual Consent: The Performance of Choice in a Coercive World

Alex_Steiner20 Mar 2025 17:12 UTC

26 points

4 comments13 min readLW link

Minor interpretability exploration #4: LayerNorm and the learning coefficient

Rareș Baron20 Mar 2025 16:18 UTC

4 points

0 comments1 min readLW link

[Question] How far along Metr’s law can AI start automating or helping with alignment research?

Christopher King20 Mar 2025 15:58 UTC

20 points

21 comments1 min readLW link

Human alignment

Lucien20 Mar 2025 15:52 UTC

−16 points

2 comments1 min readLW link

[Question] Seeking: more Sci Fi micro reviews

Yair Halberstadt20 Mar 2025 14:31 UTC

7 points

0 comments1 min readLW link

AI #108: Straight Line on a Graph

Zvi20 Mar 2025 13:50 UTC

43 points

5 comments39 min readLW link

(thezvi.wordpress.com)

What is an alignment tax?

Vishakha and Algon

20 Mar 2025 13:06 UTC

5 points

0 comments1 min readLW link

(aisafety.info)

Longtermist Implications of the Existence Neutrality Hypothesis

Maxime Riché20 Mar 2025 12:20 UTC

3 points

2 comments21 min readLW link

You don’t have to be “into EA” to attend EAG(x) Conferences

gergogaspar20 Mar 2025 10:44 UTC

1 point

0 comments1 min readLW link

Defense Against The Super-Worms

viemccoy20 Mar 2025 7:24 UTC

24 points

1 comment2 min readLW link

Socially Graceful Degradation

Screwtape20 Mar 2025 4:03 UTC

58 points

10 comments9 min readLW link

Apply to MATS 8.0!

Ryan Kidd and K Richards

20 Mar 2025 2:17 UTC

64 points

5 comments4 min readLW link

Improved visualizations of METR Time Horizons paper.

LDJ19 Mar 2025 23:36 UTC

30 points

4 comments2 min readLW link