All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 242526 27 28 29 30 31

An overview of control measures

ryan_greenblatt24 Mar 2025 23:16 UTC

40 points

2 comments26 min readLW link

Populectomy.ai

YonatanK24 Mar 2025 22:06 UTC

7 points

2 comments2 min readLW link

Policy for LLM Writing on LessWrong

jimrandomh and Ruby

24 Mar 2025 21:41 UTC

343 points

72 comments2 min readLW link

Analyzing long agent transcripts (Docent)

jsteinhardt24 Mar 2025 20:49 UTC

41 points

2 comments1 min readLW link

(bounded-regret.ghost.io)

Convergence 2024 Impact Review

David_Kristoffersson24 Mar 2025 20:28 UTC

13 points

0 comments14 min readLW link

The Best Lecture Series on Every Subject

Rauno Arike24 Mar 2025 20:03 UTC

13 points

1 comment2 min readLW link

Recent AI model progress feels mostly like bullshit

lc24 Mar 2025 19:28 UTC

362 points

89 comments8 min readLW link

(zeropath.com)

Learning about AI regulation should be easier

mfg24 Mar 2025 19:22 UTC

12 points

0 comments2 min readLW link

Speaker For AIs Soul

Max Abecassis24 Mar 2025 19:20 UTC

−3 points

0 comments20 min readLW link

Advanced AI Systems Will Not Follow Historical Technological Patterns and Will Not Suffer the Misattribution of Productivity Gains

Max Abecassis24 Mar 2025 19:20 UTC

8 points

0 comments10 min readLW link

AI “Deep Research” Tools Reviewed

sarahconstantin24 Mar 2025 18:40 UTC

53 points

5 comments5 min readLW link

(sarahconstantin.substack.com)

Notes on countermeasures for exploration hacking (aka sandbagging)

ryan_greenblatt24 Mar 2025 18:39 UTC

56 points

6 comments8 min readLW link

Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?

Alex Mallen, Charlie Griffin and Buck

24 Mar 2025 17:55 UTC

35 points

0 comments8 min readLW link

Straightforward Steps to Marginally Improve Odds of Whole Brain Emulation

Dom Polsinelli24 Mar 2025 17:14 UTC

8 points

20 comments6 min readLW link

From Loops to Klein Bottles: Uncovering Hidden Topology in High Dimensional Data

Gunnar Carlsson24 Mar 2025 17:09 UTC

15 points

0 comments9 min readLW link

Will Jesus Christ return in an election year?

Eric Neyman24 Mar 2025 16:50 UTC

427 points

59 comments4 min readLW link

(ericneyman.wordpress.com)

Sentinel’s Global Risks Weekly Roundup #12/2025: Famine in Gaza, H7N9 outbreak, US geopolitical leadership weakening.

NunoSempere24 Mar 2025 16:46 UTC

13 points

0 comments7 min readLW link

(blog.sentinel-team.org)

deleted

funnyfranco24 Mar 2025 15:03 UTC

−2 points

0 comments1 min readLW link

Delicious Boy Slop—Boring Diet, Effortless Weightloss

sapphire24 Mar 2025 15:01 UTC

18 points

8 comments4 min readLW link

(sapphstar.substack.com)

Hong Kong ACX Spring Meetup 2025

fbreton24 Mar 2025 14:27 UTC

1 point

0 comments1 min readLW link

More on Various AI Action Plans

Zvi24 Mar 2025 13:10 UTC

32 points

0 comments11 min readLW link

(thezvi.wordpress.com)

Emergent scaling effects on the functional hierarchies within LLMs

Paul B24 Mar 2025 13:03 UTC

8 points

0 comments9 min readLW link

Recommender Alignment for Lock-In Risk

Alfie Lamerton24 Mar 2025 12:56 UTC

8 points

0 comments7 min readLW link

Edge Cases in AI Alignment

Florian_Dietz24 Mar 2025 9:27 UTC

19 points

3 comments4 min readLW link

Towards an understanding of the Chinese AI scene

Mitchell_Porter24 Mar 2025 9:10 UTC

23 points

0 comments2 min readLW link

Selective modularity: a research agenda

cloud and Jacob G-W

24 Mar 2025 4:12 UTC

72 points

3 comments24 min readLW link

Pictures for 2024

jefftk24 Mar 2025 2:40 UTC

9 points

0 comments1 min readLW link

(www.jefftk.com)

Notes on handling non-concentrated failures with AI control: high level methods and different regimes

ryan_greenblatt24 Mar 2025 1:00 UTC

24 points

4 comments16 min readLW link

We need (a lot) more rogue agent honeypots

Ozyrus23 Mar 2025 22:24 UTC

37 points

12 comments4 min readLW link

Probability Theory Fundamentals 102: Source of the Sample Space

Ape in the coat23 Mar 2025 17:23 UTC

12 points

17 comments7 min readLW link

How to mitigate sandbagging

Teun van der Weij23 Mar 2025 17:19 UTC

32 points

0 comments8 min readLW link

Solving willpower seems easier than solving aging

Yair Halberstadt23 Mar 2025 15:25 UTC

71 points

28 comments1 min readLW link

[Question] Should I fundraise for open source search engine?

samuelshadrach23 Mar 2025 13:04 UTC

−11 points

2 comments1 min readLW link

Privateers Reborn: Cyber Letters of Marque

arealsociety23 Mar 2025 3:39 UTC

5 points

2 comments1 min readLW link

(arealsociety.substack.com)

Beware nerfing AI with opinionated human-centric sensors

Haotian23 Mar 2025 1:09 UTC

1 point

0 comments3 min readLW link

Reframing AI Safety as a Neverending Institutional Challenge

scasper23 Mar 2025 0:13 UTC

53 points

12 comments5 min readLW link

The Dangerous Illusion of AI Deterrence: Why MAIM Isn’t Rational

Robert Shuler22 Mar 2025 22:55 UTC

3 points

0 comments2 min readLW link

Dayton, Ohio, ACX Meetup

Lunawarrior22 Mar 2025 19:45 UTC

1 point

0 comments1 min readLW link

[Replication] Crosscoder-based Stage-Wise Model Diffing

Anna Soligo, Thomas Read, Oliver Clive-Griffin, dmanningcoe, Chun Hei Yip, rajashree and Jason Gross

22 Mar 2025 18:35 UTC

25 points

0 comments7 min readLW link

The Principle of Satisfying Foreknowledge

Randall Reams22 Mar 2025 18:20 UTC

1 point

0 comments2 min readLW link

[Question] Urgency in the ITN framework

Shaïman22 Mar 2025 18:16 UTC

0 points

2 comments1 min readLW link

Transhumanism and AI: Toward Prosperity or Extinction?

Shaïman22 Mar 2025 18:16 UTC

11 points

2 comments6 min readLW link

Tied Crosscoders: Explaining Chat Behavior from Base Model

Santiago Aranguri22 Mar 2025 18:07 UTC

9 points

0 comments12 min readLW link

100+ concrete projects and open problems in evals

Marius Hobbhahn22 Mar 2025 15:21 UTC

75 points

1 comment1 min readLW link

Do models say what they learn?

Andy Arditi, marvinli, Joe Benton and Miles Turpin

22 Mar 2025 15:19 UTC

127 points

12 comments13 min readLW link

deleted

funnyfranco22 Mar 2025 12:06 UTC

2 points

8 comments1 min readLW link

2025 Q3 Pivotal Research Fellowship: Applications Open

Tobias H22 Mar 2025 10:54 UTC

4 points

0 comments2 min readLW link

Good Research Takes are Not Sufficient for Good Strategic Takes

Neel Nanda22 Mar 2025 10:13 UTC

297 points

28 comments4 min readLW link

(www.neelnanda.io)

Grammatical Roles and Social Roles: A Structural Analogy

Lucien22 Mar 2025 7:44 UTC

0 points

0 comments1 min readLW link

Legibility

lsusr22 Mar 2025 6:54 UTC

22 points

22 comments2 min readLW link