Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
1
Mysteries of mode collapse
janus
Nov 8, 2022, 10:37 AM
284
points
57
comments
14
min read
LW
link
1
review
What it’s like to dissect a cadaver
Alok Singh
Nov 10, 2022, 6:40 AM
208
points
24
comments
5
min read
LW
link
(alok.github.io)
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
and
Sid Black
Nov 28, 2022, 12:54 PM
200
points
34
comments
31
min read
LW
link
I Converted Book I of The Sequences Into A Zoomer-Readable Format
dkirmani
Nov 10, 2022, 2:59 AM
200
points
32
comments
2
min read
LW
link
Tyranny of the Epistemic Majority
Scott Garrabrant
Nov 22, 2022, 5:19 PM
192
points
13
comments
9
min read
LW
link
1
review
Conjecture: a retrospective after 8 months of work
Connor Leahy
,
Sid Black
,
Gabriel Alfour
and
Chris Scammell
Nov 23, 2022, 5:10 PM
180
points
9
comments
8
min read
LW
link
Geometric Rationality is Not VNM Rational
Scott Garrabrant
Nov 27, 2022, 7:36 PM
176
points
27
comments
3
min read
LW
link
Planes are still decades away from displacing most bird jobs
guzey
Nov 25, 2022, 4:49 PM
168
points
13
comments
3
min read
LW
link
The Geometric Expectation
Scott Garrabrant
Nov 23, 2022, 6:05 PM
159
points
22
comments
4
min read
LW
link
Mechanistic anomaly detection and ELK
paulfchristiano
Nov 25, 2022, 6:50 PM
138
points
22
comments
21
min read
LW
link
(ai-alignment.com)
The Alignment Community Is Culturally Broken
sudo
Nov 13, 2022, 6:53 PM
136
points
68
comments
2
min read
LW
link
AI will change the world, but won’t take it over by playing “3-dimensional chess”.
boazbarak
and
benedelman
Nov 22, 2022, 6:57 PM
134
points
97
comments
24
min read
LW
link
Sadly, FTX
Zvi
Nov 17, 2022, 2:30 PM
133
points
18
comments
47
min read
LW
link
(thezvi.wordpress.com)
On the Diplomacy AI
Zvi
Nov 28, 2022, 1:20 PM
127
points
29
comments
11
min read
LW
link
(thezvi.wordpress.com)
Clarifying AI X-risk
zac_kenton
,
Rohin Shah
,
David Lindner
,
Vikrant Varma
,
Vika
,
Mary Phuong
,
Ramana Kumar
and
Elliot Catt
Nov 1, 2022, 11:03 AM
127
points
24
comments
4
min read
LW
link
1
review
Geometric Exploration, Arithmetic Exploitation
Scott Garrabrant
Nov 24, 2022, 3:36 PM
126
points
5
comments
7
min read
LW
link
Utilitarianism Meets Egalitarianism
Scott Garrabrant
Nov 21, 2022, 7:00 PM
121
points
16
comments
6
min read
LW
link
1
review
Here’s the exit.
Valentine
Nov 21, 2022, 6:07 PM
118
points
180
comments
10
min read
LW
link
5
reviews
Speculation on Current Opportunities for Unusually High Impact in Global Health
johnswentworth
Nov 11, 2022, 8:47 PM
114
points
31
comments
4
min read
LW
link
How could we know that an AGI system will have good consequences?
So8res
Nov 7, 2022, 10:42 PM
111
points
25
comments
5
min read
LW
link
Applying superintelligence without collusion
Eric Drexler
Nov 8, 2022, 6:08 PM
109
points
63
comments
4
min read
LW
link
What I Learned Running Refine
adamShimi
Nov 24, 2022, 2:49 PM
108
points
5
comments
4
min read
LW
link
Instrumental convergence is what makes general intelligence possible
tailcalled
Nov 11, 2022, 4:38 PM
105
points
11
comments
4
min read
LW
link
Caution when interpreting Deepmind’s In-context RL paper
Sam Marks
Nov 1, 2022, 2:42 AM
105
points
8
comments
4
min read
LW
link
LW Beta Feature: Side-Comments
jimrandomh
Nov 24, 2022, 1:55 AM
103
points
47
comments
1
min read
LW
link
LessWrong readers are invited to apply to the Lurkshop
Jonas V
and
GradientDissenter
Nov 22, 2022, 9:19 AM
101
points
41
comments
3
min read
LW
link
Instead of technical research, more people should focus on buying time
Orpheus16
,
OliviaJ
and
Thomas Larsen
Nov 5, 2022, 8:43 PM
100
points
45
comments
14
min read
LW
link
ARC paper: Formalizing the presumption of independence
Erik Jenner
Nov 20, 2022, 1:22 AM
97
points
2
comments
2
min read
LW
link
(arxiv.org)
Searching for Search
NicholasKees
and
janus
Nov 28, 2022, 3:31 PM
97
points
9
comments
14
min read
LW
link
1
review
Trying to Make a Treacherous Mesa-Optimizer
MadHatter
Nov 9, 2022, 6:07 PM
95
points
14
comments
4
min read
LW
link
(attentionspan.blog)
Meta AI announces Cicero: Human-Level Diplomacy play (with dialogue)
Jacy Reese Anthis
Nov 22, 2022, 4:50 PM
93
points
64
comments
1
min read
LW
link
(www.science.org)
Conjecture Second Hiring Round
Connor Leahy
,
Sid Black
,
Gabriel Alfour
and
Chris Scammell
Nov 23, 2022, 5:11 PM
92
points
0
comments
1
min read
LW
link
When AI solves a game, focus on the game’s mechanics, not its theme.
Cleo Nardo
Nov 23, 2022, 7:16 PM
89
points
7
comments
2
min read
LW
link
Current themes in mechanistic interpretability research
Lee Sharkey
,
Sid Black
and
beren
Nov 16, 2022, 2:14 PM
89
points
2
comments
12
min read
LW
link
By Default, GPTs Think In Plain Sight
Fabien Roger
Nov 19, 2022, 7:15 PM
88
points
36
comments
9
min read
LW
link
Announcing the Progress Forum
jasoncrawford
Nov 17, 2022, 7:26 PM
83
points
9
comments
1
min read
LW
link
Always know where your abstractions break
lsusr
Nov 27, 2022, 6:32 AM
82
points
6
comments
2
min read
LW
link
Results from the interpretability hackathon
Esben Kran
and
Neel Nanda
Nov 17, 2022, 2:51 PM
81
points
0
comments
6
min read
LW
link
(alignmentjam.com)
Exams-Only Universities
Mati_Roy
Nov 6, 2022, 10:05 PM
80
points
40
comments
2
min read
LW
link
Threat Model Literature Review
zac_kenton
,
Rohin Shah
,
David Lindner
,
Vikrant Varma
,
Vika
,
Mary Phuong
,
Ramana Kumar
and
Elliot Catt
Nov 1, 2022, 11:03 AM
78
points
4
comments
25
min read
LW
link
What is epigenetics?
Metacelsus
6 Nov 2022 1:24 UTC
78
points
4
comments
6
min read
LW
link
(denovo.substack.com)
Follow up to medical miracle
Elizabeth
4 Nov 2022 18:00 UTC
76
points
5
comments
6
min read
LW
link
(acesounderglass.com)
Elastic Productivity Tools
Simon Berens
19 Nov 2022 21:59 UTC
76
points
8
comments
2
min read
LW
link
(simonberens.me)
Disagreement with bio anchors that lead to shorter timelines
Marius Hobbhahn
16 Nov 2022 14:40 UTC
75
points
17
comments
7
min read
LW
link
1
review
Engineering Monosemanticity in Toy Models
Adam Jermyn
,
evhub
and
Nicholas Schiefer
18 Nov 2022 1:43 UTC
75
points
7
comments
3
min read
LW
link
(arxiv.org)
Will we run out of ML data? Evidence from projecting dataset size trends
Pablo Villalobos
14 Nov 2022 16:42 UTC
75
points
12
comments
2
min read
LW
link
(epochai.org)
K-types vs T-types — what priors do you have?
Cleo Nardo
3 Nov 2022 11:29 UTC
74
points
25
comments
7
min read
LW
link
Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility
Orpheus16
and
OliviaJ
22 Nov 2022 22:19 UTC
74
points
20
comments
4
min read
LW
link
Takeaways from a survey on AI alignment resources
DanielFilan
5 Nov 2022 23:40 UTC
73
points
10
comments
6
min read
LW
link
1
review
(danielfilan.com)
Respecting your Local Preferences
Scott Garrabrant
26 Nov 2022 19:04 UTC
73
points
1
comment
4
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel