All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 232425 26 27 28 29 30 31

Miles Brundage resigned from OpenAI, and his AGI readiness team was disbanded

garrison23 Oct 2024 23:40 UTC

118 points

1 comment7 min readLW link

(garrisonlovely.substack.com)

A metaphor: what “green lights” for AGI would look like

Lorec23 Oct 2024 23:24 UTC

−1 points

6 comments2 min readLW link

Motte-and-Bailey: a Short Explanation

Lorec23 Oct 2024 22:29 UTC

12 points

0 comments1 min readLW link

Self-prediction acts as an emergent regularizer

Cameron Berg, Kvee, Mike Vaiana, Diogo de Lucena, florin_pop and Trent Hodgeson

23 Oct 2024 22:27 UTC

92 points

9 comments4 min readLW link

Technical Risks of (Lethal) Autonomous Weapons Systems

Heramb23 Oct 2024 20:41 UTC

2 points

0 comments1 min readLW link

(encodejustice.org)

Appealing to the Public

jefftk23 Oct 2024 19:00 UTC

16 points

0 comments5 min readLW link

(www.jefftk.com)

Introducing Transluce — A Letter from the Founders

jsteinhardt23 Oct 2024 18:10 UTC

74 points

3 comments3 min readLW link

(bounded-regret.ghost.io)

Are we dropping the ball on Recommendation AIs?

Charbel-Raphaël23 Oct 2024 17:48 UTC

53 points

17 comments6 min readLW link

A bird’s eye view of ARC’s research

Jacob_Hilton23 Oct 2024 15:50 UTC

121 points

12 comments7 min readLW link

(www.alignment.org)

[Question] Artificial V/S Organoid Intelligence

10xyz23 Oct 2024 14:31 UTC

9 points

0 comments1 min readLW link

AI safety tax dynamics

owencb23 Oct 2024 12:18 UTC

22 points

0 comments6 min readLW link

(strangecities.substack.com)

What is malevolence? On the nature, measurement, and distribution of dark traits

David Althaus, Chi Nguyen and Clare

23 Oct 2024 8:41 UTC

94 points

22 comments52 min readLW link

Join a LessWrong Team for the Unaging System Challenge

Crissman23 Oct 2024 6:01 UTC

15 points

5 comments1 min readLW link

Word Spaghetti

Gordon Seidoh Worley23 Oct 2024 5:39 UTC

20 points

9 comments3 min readLW link

Monosemanticity & Quantization

Rahul Chand22 Oct 2024 22:57 UTC

1 point

0 comments9 min readLW link

[Question] What is the alpha in one bit of evidence?

J Bostock22 Oct 2024 21:57 UTC

20 points

13 comments1 min readLW link

Catastrophic sabotage as a major threat model for human-level AI systems

evhub22 Oct 2024 20:57 UTC

97 points

13 comments15 min readLW link

Why I quit effective altruism, and why Timothy Telleen-Lawton is staying (for now)

Elizabeth22 Oct 2024 18:20 UTC

76 points

82 comments1 min readLW link

(acesounderglass.com)

Decision-Making Under Uncertainty: Lessons From AI

Jonasb22 Oct 2024 17:54 UTC

−1 points

0 comments5 min readLW link

(www.denominations.io)

Testing Genetic Engineering Detection with Spike-Ins

jefftk22 Oct 2024 17:20 UTC

9 points

0 comments4 min readLW link

(naobservatory.org)

Predictions as Public Works Project — What Metaculus Is Building Next

ChristianWilliams22 Oct 2024 16:35 UTC

5 points

0 comments5 min readLW link

(www.metaculus.com)

Gorges of gender on a terrain of traits

dkl922 Oct 2024 16:18 UTC

−7 points

1 comment3 min readLW link

(dkl9.net)

A Defense of Peer Review

Niko_McCarty and delton137

22 Oct 2024 16:16 UTC

23 points

2 comments22 min readLW link

(www.asimov.press)

BIG-Bench Canary Contamination in GPT-4

Jozdien22 Oct 2024 15:40 UTC

138 points

19 comments4 min readLW link 1 review

[Paper Blogpost] When Your AIs Deceive You: Challenges with Partial Observability in RLHF

Leon Lang22 Oct 2024 13:57 UTC

51 points

2 comments18 min readLW link

(arxiv.org)

[Intuitive self-models] 6. Awakening / Enlightenment / PNSE

Steven Byrnes22 Oct 2024 13:23 UTC

78 points

14 comments21 min readLW link 2 reviews

Resolving von Neumann-Morgenstern Inconsistent Preferences

niplav22 Oct 2024 11:45 UTC

39 points

5 comments58 min readLW link

Lenses of Control

WillPetillo22 Oct 2024 7:51 UTC

15 points

0 comments9 min readLW link

A Brief Explanation of AI Control

Aaron_Scher22 Oct 2024 7:00 UTC

8 points

1 comment6 min readLW link

Longevity, AI, and Cognitive Research Hackathon @ MIT

ekkolápto22 Oct 2024 6:19 UTC

1 point

0 comments1 min readLW link

Conversational Signposts—How to stop having boring social interactions

Declan Molony22 Oct 2024 5:37 UTC

13 points

6 comments2 min readLW link

I got dysentery so you don’t have to

eukaryote22 Oct 2024 4:55 UTC

340 points

8 comments17 min readLW link 2 reviews

(eukaryotewritesblog.com)

Transformers Explained (Again)

RohanS22 Oct 2024 4:06 UTC

4 points

0 comments18 min readLW link

Sleeping on Stage

jefftk22 Oct 2024 0:50 UTC

26 points

3 comments1 min readLW link

(www.jefftk.com)

The Mask Comes Off: At What Price?

Zvi21 Oct 2024 23:50 UTC

72 points

16 comments8 min readLW link

(thezvi.wordpress.com)

Distinguishing ways AI can be “concentrated”

Matthew Barnett21 Oct 2024 22:21 UTC

34 points

2 comments4 min readLW link

Jailbreaking ChatGPT and Claude using Web API Context Injection

Jaehyuk Lim21 Oct 2024 21:34 UTC

4 points

0 comments3 min readLW link

How to Teach Your Brain to Hate Procrastination

10xyz21 Oct 2024 20:12 UTC

3 points

0 comments2 min readLW link

Pausing for what?

MountainPath21 Oct 2024 20:12 UTC

0 points

1 comment1 min readLW link

What is autonomy? Why boundaries are necessary.

Chris Lakin21 Oct 2024 17:56 UTC

8 points

1 comment1 min readLW link

(chrislakin.blog)

Could randomly choosing people to serve as representatives lead to better government?

John Huang21 Oct 2024 17:10 UTC

77 points

13 comments10 min readLW link

There aren’t enough smart people in biology doing something boring

Abhishaike Mahajan21 Oct 2024 15:52 UTC

28 points

13 comments10 min readLW link

Automation collapse

Geoffrey Irving, Tomek Korbak and Benjamin Hilton

21 Oct 2024 14:50 UTC

72 points

9 comments7 min readLW link

What AI companies should do: Some rough ideas

Zach Stein-Perlman21 Oct 2024 14:00 UTC

33 points

10 comments5 min readLW link

[Question] What should OpenAI do that it hasn’t already done, to stop their vacancies from being advertised on the 80k Job Board?

WitheringWeights21 Oct 2024 13:57 UTC

23 points

0 comments1 min readLW link

A Rocket–Interpretability Analogy

plex21 Oct 2024 13:55 UTC

161 points

33 comments1 min readLW link 1 review

Tokyo AI Safety 2025: Call For Papers

Blaine21 Oct 2024 8:43 UTC

24 points

0 comments3 min readLW link

(www.tais2025.cc)

OpenAI defected, but we can take honest actions

Remmelt21 Oct 2024 8:41 UTC

17 points

16 comments2 min readLW link

Slightly More Than You Wanted To Know: Pregnancy Length Effects

JustisMills21 Oct 2024 1:26 UTC

63 points

4 comments5 min readLW link

(justismills.substack.com)

Information vs Assurance

johnswentworth20 Oct 2024 23:16 UTC

191 points

19 comments2 min readLW link 1 review