3 Apr 2025 16:23 UTC

676 points

222 comments41 min readLW link

(ai-2027.com)

Accountability Sinks

Martin Sustrik22 Apr 2025 5:00 UTC

452 points

58 comments15 min readLW link

(250bpm.substack.com)

Playing in the Creek

Hastings10 Apr 2025 17:39 UTC

401 points

13 comments2 min readLW link

(hgreer.com)

LessWrong has been acquired by EA

habryka1 Apr 2025 13:09 UTC

364 points

55 comments1 min readLW link

VDT: a solution to decision theory

L Rudolf L1 Apr 2025 21:04 UTC

353 points

33 comments4 min readLW link

Why Have Sentence Lengths Decreased?

Arjun Panickssery3 Apr 2025 17:50 UTC

292 points

89 comments4 min readLW link

(arjunpanickssery.substack.com)

To Understand History, Keep Former Population Distributions In Mind

Arjun Panickssery23 Apr 2025 4:51 UTC

243 points

13 comments2 min readLW link

(arjunpanickssery.substack.com)

Jaan Tallinn’s 2024 Philanthropy Overview

jaan23 Apr 2025 11:06 UTC

228 points

8 comments1 min readLW link

(jaan.info)

Thoughts on AI 2027

Max Harms9 Apr 2025 21:26 UTC

223 points

61 comments21 min readLW link

(intelligence.org)

Learned pain as a leading cause of chronic pain

SoerenMind9 Apr 2025 11:57 UTC

216 points

39 comments9 min readLW link

Impact, agency, and taste

benkuhn19 Apr 2025 21:10 UTC

206 points

10 comments8 min readLW link

(www.benkuhn.net)

Short Timelines Don’t Devalue Long Horizon Research

Vladimir_Nesov9 Apr 2025 0:42 UTC

178 points

24 comments1 min readLW link

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

Kaj_Sotala15 Apr 2025 15:56 UTC

176 points

52 comments18 min readLW link

Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study

Adam Karvonen14 Apr 2025 17:38 UTC

158 points

42 comments7 min readLW link

(adamkarvonen.github.io)

Alignment Faking Revisited: Improved Classifiers and Open Source Extensions

John Hughes, abhayesian, Akbir Khan and Fabien Roger

8 Apr 2025 17:32 UTC

147 points

20 comments12 min readLW link

Training AGI in Secret would be Unsafe and Unethical

Daniel Kokotajlo18 Apr 2025 12:27 UTC

140 points

15 comments6 min readLW link

AI-enabled coups: a small group could use AI to seize power

Tom Davidson, Lukas Finnveden and rosehadshar

16 Apr 2025 16:51 UTC

137 points

23 comments7 min readLW link

AI 2027 is a Bet Against Amdahl’s Law

snewman21 Apr 2025 3:09 UTC

127 points

57 comments9 min readLW link

Ctrl-Z: Controlling AI Agents via Resampling

Aryan Bhatt, Buck, Adam Kaufman and Tyler Tracy

16 Apr 2025 16:21 UTC

126 points

0 comments20 min readLW link

Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red

Julian Bradshaw21 Apr 2025 3:52 UTC

124 points

20 comments14 min readLW link

Three Months In, Evaluating Three Rationalist Cases for Trump

Arjun Panickssery18 Apr 2025 8:27 UTC

117 points

33 comments4 min readLW link

Show, not tell: GPT-4o is more opinionated in images than in text

Daniel Tan and eggsyntax

2 Apr 2025 8:51 UTC

116 points

42 comments3 min readLW link

“The Era of Experience” has an unsolved technical alignment problem

Steven Byrnes24 Apr 2025 13:57 UTC

115 points

48 comments23 min readLW link

We should try to automate AI safety work asap

Marius Hobbhahn26 Apr 2025 16:35 UTC

114 points

10 comments15 min readLW link

Among Us: A Sandbox for Agentic Deception

7vik and Adrià Garriga-alonso

5 Apr 2025 6:24 UTC

114 points

7 comments7 min readLW link

Misrepresentation as a Barrier for Interp (Part I)

johnswentworth and Steve Petersen

29 Apr 2025 17:07 UTC

113 points

12 comments7 min readLW link

AI 2027: Responses

Zvi8 Apr 2025 12:50 UTC

111 points

3 comments30 min readLW link

(thezvi.wordpress.com)

New Cause Area Proposal

CallumMcDougall1 Apr 2025 7:12 UTC

110 points

4 comments1 min readLW link

How training-gamers might function (and win)

Vivek Hebbar11 Apr 2025 21:26 UTC

110 points

5 comments13 min readLW link

The Lizardman and the Black Hat Bobcat

Screwtape6 Apr 2025 19:02 UTC

109 points

15 comments9 min readLW link

The Uses of Complacency

sarahconstantin21 Apr 2025 18:50 UTC

99 points

5 comments8 min readLW link

(sarahconstantin.substack.com)

One-shot steering vectors cause emergent misalignment, too

Jacob Dunefsky14 Apr 2025 6:40 UTC

98 points

6 comments11 min readLW link

Reward hacking is becoming more sophisticated and deliberate in frontier LLMs

Kei Nishimura-Gasparian24 Apr 2025 16:03 UTC

97 points

6 comments1 min readLW link

How to Build a Third Place on Focusmate

Parker Conley28 Apr 2025 23:46 UTC

97 points

10 comments5 min readLW link

(parconley.com)

ASI existential risk: Reconsidering Alignment as a Goal

habryka15 Apr 2025 19:57 UTC

95 points

14 comments19 min readLW link

(michaelnotebook.com)

7+ tractable directions in AI control

Julian Stastny and ryan_greenblatt

28 Apr 2025 17:12 UTC

93 points

1 comment13 min readLW link

How To Believe False Things

Eneasz2 Apr 2025 16:28 UTC

93 points

14 comments3 min readLW link

$500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?

johnswentworth and David Lorell

21 Apr 2025 20:19 UTC

92 points

24 comments3 min readLW link

Is Gemini now better than Claude at Pokémon?

Julian Bradshaw19 Apr 2025 23:34 UTC

91 points

12 comments5 min readLW link

A Slow Guide to Confronting Doom

Ruby6 Apr 2025 2:10 UTC

86 points

20 comments14 min readLW link

Keltham’s Lectures in Project Lawful

Morpheus1 Apr 2025 10:39 UTC

86 points

7 comments2 min readLW link

GPT-4o Is An Absurd Sycophant

Zvi28 Apr 2025 19:00 UTC

84 points

7 comments19 min readLW link

(thezvi.wordpress.com)

o3 Is a Lying Liar

Zvi23 Apr 2025 20:00 UTC

84 points

26 comments9 min readLW link

(thezvi.wordpress.com)

How people use LLMs

Elizabeth27 Apr 2025 21:48 UTC

83 points

6 comments1 min readLW link

(www.gleech.org)

What Makes an AI Startup “Net Positive” for Safety?

jacquesthibs18 Apr 2025 20:33 UTC

82 points

23 comments2 min readLW link

Bandwidth Rules Everything Around Me: Oliver Habryka on OpenPhil and GoodVentures

Elizabeth29 Apr 2025 20:40 UTC

81 points

15 comments1 min readLW link

(acesounderglass.com)

Announcing ILIAD2: ODYSSEY

Alexander Gietelink Oldenziel and windows

3 Apr 2025 17:01 UTC

80 points

1 comment1 min readLW link

You will crash your car in front of my house within the next week

Richard Korzekwa 1 Apr 2025 21:43 UTC

80 points

6 comments1 min readLW link

New Paper: Infra-Bayesian Decision-Estimation Theory

Vanessa Kosoy and Diffractor

10 Apr 2025 9:17 UTC

79 points

4 comments1 min readLW link

(arxiv.org)

Why does LW not put much more focus on AI governance and outreach?

Severin T. Seehrich and Benjamin Schmidt

12 Apr 2025 14:24 UTC

78 points

31 comments2 min readLW link