All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 456 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Could this be an unusually good time to Earn To Give?

TomGardiner4 Mar 2025 21:51 UTC

−1 points

0 comments3 min readLW link

(forum.effectivealtruism.org)

What is the best / most proper definition of “Feeling the AGI” there is?

Annapurna4 Mar 2025 20:13 UTC

8 points

5 comments1 min readLW link

Energy Markets Temporal Arbitrage with Batteries

NickyP4 Mar 2025 17:37 UTC

28 points

3 comments16 min readLW link

Distillation of Meta’s Large Concept Models Paper

NickyP4 Mar 2025 17:33 UTC

19 points

3 comments4 min readLW link

Top AI safety newsletters, books, podcasts, etc – new AISafety.com resource

Bryce Robertson and Søren Elverlin

4 Mar 2025 17:01 UTC

33 points

2 comments1 min readLW link

2028 Should Not Be AI Safety’s First Foray Into Politics

Jesse Richardson4 Mar 2025 16:46 UTC

5 points

0 comments2 min readLW link

[Question] How Much Are LLMs Actually Boosting Real-World Programmer Productivity?

Thane Ruthenis4 Mar 2025 16:23 UTC

141 points

52 comments3 min readLW link

Validating against a misalignment detector is very different to training against one

mattmacdermott4 Mar 2025 15:41 UTC

39 points

4 comments4 min readLW link

For scheming, we should first focus on detection and then on prevention

Marius Hobbhahn4 Mar 2025 15:22 UTC

49 points

7 comments5 min readLW link

Progress links and short notes, 2025-03-03

jasoncrawford4 Mar 2025 15:20 UTC

8 points

0 comments6 min readLW link

(newsletter.rootsofprogress.org)

Formation Research: Organisation Overview

alamerton4 Mar 2025 15:03 UTC

6 points

0 comments11 min readLW link

On Writing #1

Zvi4 Mar 2025 13:30 UTC

38 points

2 comments15 min readLW link

(thezvi.wordpress.com)

The Semi-Rational Militar Firefighter

P. João4 Mar 2025 12:23 UTC

73 points

10 comments2 min readLW link

Observations About LLM Inference Pricing

Aaron_Scher4 Mar 2025 3:03 UTC

34 points

2 comments9 min readLW link

(techgov.intelligence.org)

[Question] How much should I worry about the Atlanta Fed’s GDP estimates?

Brendan Long4 Mar 2025 2:03 UTC

16 points

2 comments1 min readLW link

[Question] shouldn’t we try to get media attention?

KvmanThinking4 Mar 2025 1:39 UTC

6 points

1 comment1 min readLW link

The Milton Friedman Model of Policy Change

JohnofCharleston4 Mar 2025 0:38 UTC

147 points

17 comments4 min readLW link

The Compliment Sandwich 🥪 aka: How to criticize a normie without making them upset.

keltan3 Mar 2025 23:15 UTC

15 points

10 comments1 min readLW link

AI Safety at the Frontier: Paper Highlights, February ’25

gasteigerjo3 Mar 2025 22:09 UTC

7 points

0 comments7 min readLW link

(aisafetyfrontier.substack.com)

What goals will AIs have? A list of hypotheses

Daniel Kokotajlo3 Mar 2025 20:08 UTC

90 points

20 comments18 min readLW link

Takeaways From Our Recent Work on SAE Probing

Josh Engels, Subhash Kantamneni, Senthooran Rajamanoharan and Neel Nanda

3 Mar 2025 19:50 UTC

30 points

4 comments5 min readLW link

Why People Commit White Collar Fraud (Ozy linkpost)

sapphire3 Mar 2025 19:33 UTC

24 points

1 comment1 min readLW link

(thingofthings.substack.com)

[Question] Ask Me Anything—Samuel

samuelshadrach3 Mar 2025 19:24 UTC

0 points

0 comments1 min readLW link

Expanding HarmBench: Investigating Gaps & Extending Adversarial LLM Testing

racinkc13 Mar 2025 19:23 UTC

1 point

0 comments1 min readLW link

Could Advanced AI Accelerate the Pace of AI Progress? Interviews with AI Researchers

jleibowich, Nikola Jurkovic and Tom Davidson

3 Mar 2025 19:05 UTC

41 points

1 comment1 min readLW link

(papers.ssrn.com)

Middle School Choice

jefftk3 Mar 2025 16:10 UTC

27 points

10 comments4 min readLW link

(www.jefftk.com)

On GPT-4.5

Zvi3 Mar 2025 13:40 UTC

44 points

12 comments22 min readLW link

(thezvi.wordpress.com)

Coalescence—Determinism In Ways We Care About

vitaliya3 Mar 2025 13:20 UTC

12 points

0 comments11 min readLW link

Methods for strong human germline engineering

TsviBT3 Mar 2025 8:13 UTC

149 points

29 comments108 min readLW link

[Question] Examples of self-fulfilling prophecies in AI alignment?

Chris Lakin3 Mar 2025 2:45 UTC

24 points

10 comments1 min readLW link

[Question] Request for Comments on AI-related Prediction Market Ideas

PeterMcCluskey2 Mar 2025 20:52 UTC

17 points

1 comment3 min readLW link

Statistical Challenges with Making Super IQ babies

Jan Christian Refsgaard2 Mar 2025 20:26 UTC

154 points

26 comments9 min readLW link

Cautions about LLMs in Human Cognitive Loops

Alice Blair2 Mar 2025 19:53 UTC

40 points

13 comments7 min readLW link

Self-fulfilling misalignment data might be poisoning our AI models

TurnTrout2 Mar 2025 19:51 UTC

156 points

29 comments1 min readLW link

(turntrout.com)

Spencer Greenberg hiring a personal/professional/research remote assistant for 5-10 hours per week

spencerg2 Mar 2025 18:01 UTC

13 points

0 comments1 min readLW link

[Question] Will LLM agents become the first takeover-capable AGIs?

Seth Herd2 Mar 2025 17:15 UTC

37 points

10 comments1 min readLW link

Not-yet-falsifiable beliefs?

Benjamin Hendricks2 Mar 2025 14:11 UTC

6 points

4 comments1 min readLW link

Saving Zest

jefftk2 Mar 2025 12:00 UTC

24 points

1 comment1 min readLW link

(www.jefftk.com)

Open Thread Spring 2025

Ben Pace2 Mar 2025 2:33 UTC

20 points

48 comments1 min readLW link

[Question] help, my self image as rational is affecting my ability to empathize with others

KvmanThinking2 Mar 2025 2:06 UTC

1 point

13 comments1 min readLW link

Maintaining Alignment during RSI as a Feedback Control Problem

beren2 Mar 2025 0:21 UTC

67 points

6 comments11 min readLW link

AI Safety Policy Won’t Go On Like This – AI Safety Advocacy Is Failing Because Nobody Cares.

henophilia1 Mar 2025 20:15 UTC

1 point

1 comment1 min readLW link

(blog.hermesloom.org)

Meaning Machines

appromoximate1 Mar 2025 19:16 UTC

0 points

0 comments13 min readLW link

[Question] Share AI Safety Ideas: Both Crazy and Not

ank1 Mar 2025 19:08 UTC

17 points

28 comments1 min readLW link

Historiographical Compressions: Renaissance as An Example

adamShimi1 Mar 2025 18:21 UTC

17 points

4 comments7 min readLW link

(formethods.substack.com)

Real-Time Gigstats

jefftk1 Mar 2025 14:10 UTC

9 points

0 comments1 min readLW link

(www.jefftk.com)

Open problems in emergent misalignment

Jan Betley and Daniel Tan

1 Mar 2025 9:47 UTC

83 points

17 comments7 min readLW link

Estimating the Probability of Sampling a Trained Neural Network at Random

Adam Scherlis and Nora Belrose

1 Mar 2025 2:11 UTC

32 points

10 comments1 min readLW link

(arxiv.org)

[Question] What nation did Trump prevent from going to war (Feb. 2025)?

James Camacho1 Mar 2025 1:46 UTC

3 points

5 comments1 min readLW link

AXRP Episode 38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future

DanielFilan1 Mar 2025 1:20 UTC

13 points

0 comments13 min readLW link