All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All JanFebMar Apr May Jun Jul Aug

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 232425 26 27 28

Human-AI Relationality is Already Here

bridgebotFeb 20, 2025, 7:08 AM

17 points

0 comments15 min readLW link

Safe Distillation With a Powerful Untrusted AI

Alek WestoverFeb 20, 2025, 3:14 AM

5 points

1 comment5 min readLW link

Modularity and assembly: AI safety via thinking smaller

D WongFeb 20, 2025, 12:58 AM

2 points

0 comments11 min readLW link

(criticalreason.substack.com)

Eliezer’s Lost Alignment Articles / The Arbital Sequence

Ruby and RobertM

Feb 20, 2025, 12:48 AM

207 points

10 comments5 min readLW link

Arbital has been imported to LessWrong

RobertM, jimrandomh, Ben Pace and Ruby

Feb 20, 2025, 12:47 AM

281 points

30 comments5 min readLW link

The Dilemma’s Dilemma

James Stephen BrownFeb 19, 2025, 11:50 PM

9 points

12 comments7 min readLW link

(nonzerosum.games)

[Question] Why do we have the NATO logo?

KvmanThinkingFeb 19, 2025, 10:59 PM

1 point

4 comments1 min readLW link

Metaculus Q4 AI Benchmarking: Bots Are Closing The Gap

Molly and Tom Liptay

Feb 19, 2025, 10:42 PM

13 points

0 comments13 min readLW link

(www.metaculus.com)

Several Arguments Against the Mathematical Universe Hypothesis

Vittu PerkeleFeb 19, 2025, 10:13 PM

−4 points

6 comments3 min readLW link

(open.substack.com)

Literature Review of Text AutoEncoders

NickyPFeb 19, 2025, 9:54 PM

20 points

5 comments8 min readLW link

DeepSeek Made it Even Harder for US AI Companies to Ever Reach Profitability

garrisonFeb 19, 2025, 9:02 PM

10 points

1 comment3 min readLW link

(garrisonlovely.substack.com)

Won’t vs. Can’t: Sandbagging-like Behavior from Claude Models

Joe Benton and Zachary Witten

Feb 19, 2025, 8:47 PM

15 points

1 comment1 min readLW link

(alignment.anthropic.com)

AI Alignment and the Financial War Against Narcissistic Manipulation

henophiliaFeb 19, 2025, 8:42 PM

−17 points

2 comments3 min readLW link

How to Make Superbabies

GeneSmith and kman

Feb 19, 2025, 8:39 PM

617 points

355 comments31 min readLW link

The Newbie’s Guide to Navigating AI Futures

keithjmenezesFeb 19, 2025, 8:37 PM

−1 points

0 comments40 min readLW link

Against Unlimited Genius for Baby-Killers

gggggFeb 19, 2025, 8:33 PM

−7 points

1 comment3 min readLW link

(ggggggggggggggggggggggg.substack.com)

New LLM Scaling Law

wrmedfordFeb 19, 2025, 8:21 PM

2 points

0 comments1 min readLW link

(github.com)

Go Grok Yourself

ZviFeb 19, 2025, 8:20 PM

57 points

2 comments17 min readLW link

(thezvi.wordpress.com)

[Question] Take over my project: do computable agents plan against the universal distribution pessimistically?

Cole WyethFeb 19, 2025, 8:17 PM

25 points

3 comments3 min readLW link

When should we worry about AI power-seeking?

Joe CarlsmithFeb 19, 2025, 7:44 PM

22 points

0 comments18 min readLW link

(joecarlsmith.substack.com)

SuperBabies podcast with Gene Smith

EneaszFeb 19, 2025, 7:36 PM

35 points

1 comment1 min readLW link

(thebayesianconspiracy.substack.com)

Undesirable Conclusions and Origin Adjustment

JerdleFeb 19, 2025, 6:35 PM

3 points

0 comments5 min readLW link

How might we safely pass the buck to AI?

joshcFeb 19, 2025, 5:48 PM

83 points

58 comments31 min readLW link

Using Prompt Evaluation to Combat Bio-Weapon Research

Stuart_Armstrong and rgorman

Feb 19, 2025, 12:39 PM

11 points

2 comments3 min readLW link

Intelligence Is Jagged

Adam TrainFeb 19, 2025, 7:08 AM

6 points

1 comment3 min readLW link

Closed-ended questions aren’t as hard as you think

electroswingFeb 19, 2025, 3:53 AM

6 points

0 comments3 min readLW link

Undergrad AI Safety Conference

JoNeedsSleepFeb 19, 2025, 3:43 AM

19 points

0 comments1 min readLW link

Permanent properties of things are a self-fulfilling prophecy

YanLyutnevFeb 19, 2025, 12:08 AM

4 points

0 comments9 min readLW link

Places of Loving Grace [Story]

ankFeb 18, 2025, 11:49 PM

−1 points

0 comments4 min readLW link

Are SAE features from the Base Model still meaningful to LLaVA?

Shan23ChenFeb 18, 2025, 10:16 PM

8 points

2 comments10 min readLW link

(www.lesswrong.com)

Sparse Autoencoder Features for Classifications and Transferability

Shan23ChenFeb 18, 2025, 10:14 PM

5 points

0 comments1 min readLW link

(arxiv.org)

A fable on AI x-risk

bgaesopFeb 18, 2025, 8:15 PM

8 points

4 comments1 min readLW link

The Unearned Privilege We Rarely Discuss: Cognitive Capability

DiegoRojasFeb 18, 2025, 8:06 PM

−21 points

7 comments3 min readLW link

Call for Applications: XLab Summer Research Fellowship

JoNeedsSleepFeb 18, 2025, 7:19 PM

9 points

0 comments1 min readLW link

AISN #48: Utility Engineering and EnigmaEval

Corin Katzke and Dan H

Feb 18, 2025, 7:15 PM

4 points

0 comments4 min readLW link

(newsletter.safe.ai)

Abstract Mathematical Concepts vs. Abstractions Over Real-World Systems

Thane RuthenisFeb 18, 2025, 6:04 PM

32 points

10 comments4 min readLW link

How accurate was my “Altered Traits” book review?

lsusrFeb 18, 2025, 5:00 PM

43 points

3 comments3 min readLW link

Medical Roundup #4

ZviFeb 18, 2025, 1:40 PM

24 points

3 comments10 min readLW link

(thezvi.wordpress.com)

Dear AGI,

Nathan YoungFeb 18, 2025, 10:48 AM

88 points

11 comments3 min readLW link

There are a lot of upcoming retreats/conferences between March and July (2025)

gergogaspar and ENAIS

Feb 18, 2025, 9:30 AM

6 points

0 comments1 min readLW link

Sea Change

Charlie SandersFeb 18, 2025, 6:03 AM

−2 points

2 comments5 min readLW link

(www.dailymicrofiction.com)

Born on Third Base: The Case for Inheriting Nothing and Building Everything

charlieoneillFeb 18, 2025, 12:47 AM

−24 points

16 comments2 min readLW link

Do models know when they are being evaluated?

Govind Pimpale, Giles, Joe Needham and Marius Hobbhahn

Feb 17, 2025, 11:13 PM

59 points

8 comments12 min readLW link

AGI Safety & Alignment @ Google DeepMind is hiring

Rohin ShahFeb 17, 2025, 9:11 PM

102 points

19 comments10 min readLW link

The Peeperi (unfinished) - By Katja Grace

Nathan YoungFeb 17, 2025, 7:33 PM

22 points

0 comments3 min readLW link

(docs.google.com)

Progress links and short notes, 2025-02-17

jasoncrawfordFeb 17, 2025, 7:18 PM

8 points

0 comments7 min readLW link

(newsletter.rootsofprogress.org)

Claude 3.5 Sonnet (New)’s AGI scenario

Nathan YoungFeb 17, 2025, 6:47 PM

5 points

2 comments5 min readLW link

Talking to laymen about AI development

David SteelFeb 17, 2025, 6:42 PM

8 points

0 comments1 min readLW link

On the Rebirth of Aristocracy in the American Regime

shawkisukkarFeb 17, 2025, 4:18 PM

−16 points

3 comments9 min readLW link

(shawkisukkar.substack.com)

Ascetic hedonism

dkl9Feb 17, 2025, 3:56 PM

15 points

9 comments2 min readLW link

(dkl9.net)