Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
2
Far-UVC Light Update: No, LEDs are not around the corner (tweetstorm)
Davidmanheim
Nov 2, 2022, 12:57 PM
73
points
27
comments
4
min read
LW
link
(twitter.com)
Distinguishing test from training
So8res
Nov 29, 2022, 9:41 PM
72
points
11
comments
6
min read
LW
link
Don’t design agents which exploit adversarial inputs
TurnTrout
and
Garrett Baker
Nov 18, 2022, 1:48 AM
72
points
64
comments
12
min read
LW
link
My take on Jacob Cannell’s take on AGI safety
Steven Byrnes
Nov 28, 2022, 2:01 PM
72
points
15
comments
30
min read
LW
link
1
review
Update to Mysteries of mode collapse: text-davinci-002 not RLHF
janus
Nov 19, 2022, 11:51 PM
71
points
8
comments
2
min read
LW
link
Real-Time Research Recording: Can a Transformer Re-Derive Positional Info?
Neel Nanda
Nov 1, 2022, 11:56 PM
69
points
16
comments
1
min read
LW
link
(youtu.be)
Career Scouting: Dentistry
koratkar
Nov 20, 2022, 3:55 PM
69
points
5
comments
5
min read
LW
link
(careerscouting.substack.com)
Why Would AI “Aim” To Defeat Humanity?
HoldenKarnofsky
Nov 29, 2022, 7:30 PM
69
points
10
comments
33
min read
LW
link
(www.cold-takes.com)
Deontology and virtue ethics as “effective theories” of consequentialist ethics
Jan_Kulveit
Nov 17, 2022, 2:11 PM
68
points
9
comments
LW
link
1
review
All AGI Safety questions welcome (especially basic ones) [~monthly thread]
Robert Miles
Nov 1, 2022, 11:23 PM
68
points
105
comments
2
min read
LW
link
The First Filter
adamShimi
and
Gabriel Alfour
Nov 26, 2022, 7:37 PM
67
points
5
comments
1
min read
LW
link
2022 LessWrong Census?
SurfingOrca
Nov 7, 2022, 5:16 AM
67
points
13
comments
1
min read
LW
link
Against “Classic Style”
Cleo Nardo
Nov 23, 2022, 10:10 PM
67
points
30
comments
4
min read
LW
link
Clarifying wireheading terminology
leogao
Nov 24, 2022, 4:53 AM
66
points
6
comments
1
min read
LW
link
Alignment allows “nonrobust” decision-influences and doesn’t require robust grading
TurnTrout
Nov 29, 2022, 6:23 AM
62
points
41
comments
15
min read
LW
link
Announcing AI safety Mentors and Mentees
Marius Hobbhahn
Nov 23, 2022, 3:21 PM
62
points
7
comments
10
min read
LW
link
Could a single alien message destroy us?
Writer
and
Matthew Barnett
Nov 25, 2022, 7:32 AM
61
points
23
comments
6
min read
LW
link
(youtu.be)
Against a General Factor of Doom
Jeffrey Heninger
Nov 23, 2022, 4:50 PM
61
points
19
comments
4
min read
LW
link
1
review
(aiimpacts.org)
New Frontiers in Mojibake
Adam Scherlis
Nov 26, 2022, 2:37 AM
60
points
7
comments
6
min read
LW
link
1
review
(adam.scherlis.com)
FTX will probably be sold at a steep discount. What we know and some forecasts on what will happen next
Nathan Young
Nov 9, 2022, 2:14 AM
60
points
21
comments
LW
link
The Least Controversial Application of Geometric Rationality
Scott Garrabrant
Nov 25, 2022, 4:50 PM
60
points
22
comments
4
min read
LW
link
What’s the Deal with Elon Musk and Twitter?
Zvi
Nov 7, 2022, 1:50 PM
60
points
13
comments
31
min read
LW
link
(thezvi.wordpress.com)
Open technical problem: A Quinean proof of Löb’s theorem, for an easier cartoon guide
Andrew_Critch
Nov 24, 2022, 9:16 PM
58
points
35
comments
3
min read
LW
link
1
review
Humans do acausal coordination all the time
Adam Jermyn
Nov 2, 2022, 2:40 PM
57
points
35
comments
3
min read
LW
link
Some advice on independent research
Marius Hobbhahn
Nov 8, 2022, 2:46 PM
56
points
5
comments
10
min read
LW
link
A philosopher’s critique of RLHF
TW123
Nov 7, 2022, 2:42 AM
55
points
8
comments
2
min read
LW
link
Announcing Nonlinear Emergency Funding
KatWoods
Nov 13, 2022, 7:02 PM
54
points
0
comments
LW
link
Human-level Diplomacy was my fire alarm
Lao Mein
Nov 23, 2022, 10:05 AM
54
points
15
comments
3
min read
LW
link
Kelsey Piper’s recent interview of SBF
agucova
Nov 16, 2022, 8:30 PM
51
points
29
comments
LW
link
What’s the Alternative to Independence?
jefftk
Nov 13, 2022, 3:30 PM
50
points
3
comments
1
min read
LW
link
(www.jefftk.com)
Human-level Full-Press Diplomacy (some bare facts).
Cleo Nardo
Nov 22, 2022, 8:59 PM
50
points
7
comments
3
min read
LW
link
Noting an unsubstantiated communal belief about the FTX disaster
Yitz
Nov 13, 2022, 5:37 AM
50
points
52
comments
LW
link
Developer experience for the motivation
Adam Zerner
Nov 16, 2022, 7:12 AM
49
points
7
comments
4
min read
LW
link
“Rudeness”, a useful coordination mechanic
Raemon
Nov 11, 2022, 10:27 PM
49
points
20
comments
2
min read
LW
link
Don’t align agents to evaluations of plans
TurnTrout
Nov 26, 2022, 9:16 PM
48
points
49
comments
18
min read
LW
link
A Mystery About High Dimensional Concept Encoding
Fabien Roger
Nov 3, 2022, 5:05 PM
46
points
13
comments
7
min read
LW
link
Information Markets
eva_
Nov 2, 2022, 1:24 AM
46
points
6
comments
12
min read
LW
link
A Short Dialogue on the Meaning of Reward Functions
Leon Lang
,
Quintin Pope
and
peligrietzer
Nov 19, 2022, 9:04 PM
45
points
0
comments
3
min read
LW
link
For ELK truth is mostly a distraction
c.trout
Nov 4, 2022, 9:14 PM
44
points
0
comments
21
min read
LW
link
The FTX Saga—Simplified
Annapurna
Nov 16, 2022, 2:42 AM
44
points
10
comments
7
min read
LW
link
(jorgevelez.substack.com)
Spectrum of Independence
jefftk
Nov 5, 2022, 2:40 AM
43
points
7
comments
1
min read
LW
link
(www.jefftk.com)
Rationalist Town Hall: FTX Fallout Edition (RSVP Required)
Ben Pace
Nov 23, 2022, 1:38 AM
43
points
13
comments
2
min read
LW
link
The biological function of love for non-kin is to gain the trust of people we cannot deceive
chaosmage
Nov 7, 2022, 8:26 PM
43
points
3
comments
8
min read
LW
link
Weekly Roundup #4
Zvi
Nov 4, 2022, 3:00 PM
42
points
1
comment
6
min read
LW
link
(thezvi.wordpress.com)
The optimal angle for a solar boiler is different than for a solar panel
Yair Halberstadt
Nov 10, 2022, 10:32 AM
42
points
4
comments
2
min read
LW
link
We must be very clear: fraud in the service of effective altruism is unacceptable
evhub
Nov 10, 2022, 11:31 PM
42
points
56
comments
LW
link
A newcomer’s guide to the technical AI safety field
zeshen
Nov 4, 2022, 2:29 PM
42
points
3
comments
10
min read
LW
link
Why square errors?
Aprillion
Nov 26, 2022, 1:40 PM
41
points
11
comments
2
min read
LW
link
Counterfactability
Scott Garrabrant
Nov 7, 2022, 5:39 AM
40
points
5
comments
11
min read
LW
link
Scott Aaronson on “Reform AI Alignment”
Shmi
Nov 20, 2022, 10:20 PM
39
points
17
comments
1
min read
LW
link
(scottaaronson.blog)
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel