Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Page
1
(The) Lightcone is nothing without its people: LW + Lighthaven’s big fundraiser
habryka
30 Nov 2024 2:55 UTC
612
points
273
comments
42
min read
LW
link
LessWrong’s (first) album: I Have Been A Good Bing
habryka
and
kave
1 Apr 2024 7:33 UTC
576
points
182
comments
11
min read
LW
link
OpenAI Email Archives (from Musk v. Altman and OpenAI blog)
habryka
16 Nov 2024 6:38 UTC
540
points
82
comments
51
min read
LW
link
Alignment Faking in Large Language Models
ryan_greenblatt
,
evhub
,
Carson Denison
,
Benjamin Wright
,
Fabien Roger
,
Monte M
,
Sam Marks
,
Johannes Treutlein
,
Sam Bowman
and
Buck
18 Dec 2024 17:19 UTC
496
points
85
comments
10
min read
LW
link
3
reviews
I would have shit in that alley, too
Declan Molony
18 Jun 2024 4:41 UTC
487
points
143
comments
4
min read
LW
link
2
reviews
The Best Tacit Knowledge Videos on Every Subject
Parker Conley
31 Mar 2024 17:14 UTC
453
points
176
comments
21
min read
LW
link
1
review
Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai
16 Apr 2024 21:16 UTC
432
points
101
comments
12
min read
LW
link
1
review
Failures in Kindness
silentbob
26 Mar 2024 21:30 UTC
432
points
61
comments
9
min read
LW
link
1
review
How I got 4.2M YouTube views without making a single video
Closed Limelike Curves
3 Sep 2024 3:52 UTC
426
points
38
comments
1
min read
LW
link
2
reviews
The hostile telepaths problem
Valentine
27 Oct 2024 15:26 UTC
424
points
107
comments
15
min read
LW
link
6
reviews
Reliable Sources: The Story of David Gerard
TracingWoodgrains
10 Jul 2024 19:50 UTC
403
points
56
comments
43
min read
LW
link
2
reviews
Survival without dignity
L Rudolf L
4 Nov 2024 2:29 UTC
397
points
30
comments
15
min read
LW
link
1
review
(nosetgauge.substack.com)
There is way too much serendipity
Malmesbury
19 Jan 2024 19:37 UTC
396
points
58
comments
7
min read
LW
link
1
review
My hour of memoryless lucidity
Eric Neyman
4 May 2024 1:40 UTC
381
points
38
comments
5
min read
LW
link
1
review
(ericneyman.wordpress.com)
Review: Planecrash
L Rudolf L
27 Dec 2024 14:18 UTC
374
points
58
comments
22
min read
LW
link
2
reviews
(nosetgauge.substack.com)
Notifications Received in 30 Minutes of Class
tanagrabeast
26 May 2024 17:02 UTC
374
points
17
comments
8
min read
LW
link
1
review
Safety isn’t safety without a social model (or: dispelling the myth of per se technical safety)
Andrew_Critch
14 Jun 2024 0:16 UTC
368
points
41
comments
4
min read
LW
link
3
reviews
Thoughts on seed oil
dynomight
20 Apr 2024 12:29 UTC
365
points
131
comments
17
min read
LW
link
1
review
(dynomight.net)
[April Fools’ Day] Introducing Open Asteroid Impact
Linch
1 Apr 2024 8:14 UTC
355
points
35
comments
1
min read
LW
link
3
reviews
(openasteroidimpact.org)
What Goes Without Saying
sarahconstantin
20 Dec 2024 18:00 UTC
355
points
29
comments
5
min read
LW
link
1
review
(sarahconstantin.substack.com)
You don’t know how bad most things are nor precisely how they’re bad.
Solenoid_Entity
4 Aug 2024 14:12 UTC
355
points
52
comments
5
min read
LW
link
1
review
Universal Basic Income and Poverty
Eliezer Yudkowsky
26 Jul 2024 7:23 UTC
354
points
150
comments
9
min read
LW
link
1
review
I got dysentery so you don’t have to
eukaryote
22 Oct 2024 4:55 UTC
340
points
8
comments
17
min read
LW
link
2
reviews
(eukaryotewritesblog.com)
Biological risk from the mirror world
jasoncrawford
12 Dec 2024 19:07 UTC
336
points
39
comments
7
min read
LW
link
1
review
(newsletter.rootsofprogress.org)
MIRI 2024 Communications Strategy
Gretta Duleba
29 May 2024 19:33 UTC
325
points
218
comments
7
min read
LW
link
The Field of AI Alignment: A Postmortem, and What To Do About It
johnswentworth
26 Dec 2024 18:48 UTC
322
points
176
comments
8
min read
LW
link
3
reviews
Would catching your AIs trying to escape convince AI developers to slow down or undeploy?
Buck
26 Aug 2024 16:46 UTC
321
points
78
comments
4
min read
LW
link
1
review
Gentleness and the artificial Other
Joe Carlsmith
2 Jan 2024 18:21 UTC
321
points
34
comments
11
min read
LW
link
1
review
On green
Joe Carlsmith
21 Mar 2024 17:38 UTC
318
points
46
comments
31
min read
LW
link
3
reviews
Laziness death spirals
PatrickDFarley
19 Sep 2024 15:58 UTC
314
points
45
comments
8
min read
LW
link
2
reviews
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
evhub
,
Carson Denison
,
Meg
,
Monte M
,
David Duvenaud
,
Nicholas Schiefer
and
Ethan Perez
12 Jan 2024 19:51 UTC
310
points
95
comments
3
min read
LW
link
(arxiv.org)
By default, capital will matter more than ever after AGI
L Rudolf L
28 Dec 2024 17:52 UTC
309
points
108
comments
16
min read
LW
link
2
reviews
(nosetgauge.substack.com)
Scale Was All We Needed, At First
GMM
14 Feb 2024 1:49 UTC
298
points
35
comments
8
min read
LW
link
(aiacumen.substack.com)
Overview of strong human intelligence amplification methods
TsviBT
8 Oct 2024 8:37 UTC
298
points
165
comments
10
min read
LW
link
2
reviews
Orienting to 3 year AGI timelines
Nikola Jurkovic
22 Dec 2024 1:15 UTC
298
points
63
comments
8
min read
LW
link
2
reviews
Non-Disparagement Canaries for OpenAI
aysja
and
Adam Scholl
30 May 2024 19:20 UTC
290
points
51
comments
2
min read
LW
link
The Online Sports Gambling Experiment Has Failed
Zvi
11 Nov 2024 14:30 UTC
290
points
61
comments
11
min read
LW
link
1
review
(thezvi.wordpress.com)
“No-one in my org puts money in their pension”
Tobes
16 Feb 2024 18:33 UTC
287
points
17
comments
9
min read
LW
link
1
review
(seekingtobejolly.substack.com)
Raising children on the eve of AI
juliawise
15 Feb 2024 21:28 UTC
286
points
48
comments
5
min read
LW
link
1
review
My Clients, The Liars
ymeskhout
5 Mar 2024 21:06 UTC
283
points
92
comments
7
min read
LW
link
2
reviews
The Great Data Integration Schlep
sarahconstantin
13 Sep 2024 15:40 UTC
279
points
20
comments
9
min read
LW
link
(sarahconstantin.substack.com)
Truthseeking is the ground in which other principles grow
Elizabeth
27 May 2024 1:09 UTC
278
points
18
comments
16
min read
LW
link
2
reviews
My AI Model Delta Compared To Yudkowsky
johnswentworth
10 Jun 2024 16:12 UTC
277
points
107
comments
4
min read
LW
link
The case for ensuring that powerful AIs are controlled
ryan_greenblatt
and
Buck
24 Jan 2024 16:11 UTC
275
points
74
comments
28
min read
LW
link
1
review
80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly)
Raemon
3 Jul 2024 20:34 UTC
274
points
71
comments
3
min read
LW
link
the case for CoT unfaithfulness is overstated
nostalgebraist
29 Sep 2024 22:07 UTC
269
points
44
comments
11
min read
LW
link
1
review
Express interest in an “FHI of the West”
habryka
18 Apr 2024 3:32 UTC
268
points
41
comments
3
min read
LW
link
Getting 50% (SoTA) on ARC-AGI with GPT-4o
ryan_greenblatt
17 Jun 2024 18:44 UTC
267
points
50
comments
13
min read
LW
link
Believing In
AnnaSalamon
8 Feb 2024 7:06 UTC
266
points
59
comments
13
min read
LW
link
4
reviews
Leaving MIRI, Seeking Funding
abramdemski
8 Aug 2024 18:32 UTC
264
points
19
comments
2
min read
LW
link
Back to top
Next