Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
[Linkpost] Jan Leike on three kinds of alignment taxes
Orpheus16
Jan 6, 2023, 11:57 PM
27
points
2
comments
3
min read
LW
link
(aligned.substack.com)
The Limit of Language Models
DragonGod
Jan 6, 2023, 11:53 PM
44
points
26
comments
4
min read
LW
link
Why didn’t we get the four-hour workday?
jasoncrawford
Jan 6, 2023, 9:29 PM
141
points
34
comments
6
min read
LW
link
(rootsofprogress.org)
AI security might be helpful for AI alignment
Igor Ivanov
Jan 6, 2023, 8:16 PM
36
points
1
comment
2
min read
LW
link
Categorizing failures as “outer” or “inner” misalignment is often confused
Rohin Shah
Jan 6, 2023, 3:48 PM
93
points
21
comments
8
min read
LW
link
Definitions of “objective” should be Probable and Predictive
Rohin Shah
Jan 6, 2023, 3:40 PM
43
points
27
comments
12
min read
LW
link
200 COP in MI: Techniques, Tooling and Automation
Neel Nanda
Jan 6, 2023, 3:08 PM
13
points
0
comments
15
min read
LW
link
Ball Square Station and Ridership Maximization
jefftk
Jan 6, 2023, 1:20 PM
13
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Childhood Roundup #1
Zvi
Jan 6, 2023, 1:00 PM
84
points
27
comments
8
min read
LW
link
(thezvi.wordpress.com)
AI improving AI [MLAISU W01!]
Esben Kran
Jan 6, 2023, 11:13 AM
5
points
0
comments
4
min read
LW
link
(newsletter.apartresearch.com)
AI Safety Camp, Virtual Edition 2023
Linda Linsefors
Jan 6, 2023, 11:09 AM
40
points
10
comments
3
min read
LW
link
(aisafety.camp)
Kakistocuriosity
LVSN
Jan 6, 2023, 7:38 AM
7
points
3
comments
1
min read
LW
link
AI Safety Camp: Machine Learning for Scientific Discovery
Eleni Angelou
Jan 6, 2023, 3:21 AM
3
points
0
comments
1
min read
LW
link
Metaculus Year in Review: 2022
ChristianWilliams
Jan 6, 2023, 1:23 AM
6
points
0
comments
LW
link
UDASSA
Jacob Falkovich
Jan 6, 2023, 1:07 AM
27
points
8
comments
10
min read
LW
link
The Involuntary Pacifists
Capybasilisk
Jan 6, 2023, 12:28 AM
11
points
3
comments
2
min read
LW
link
Get an Electric Toothbrush.
Cervera
Jan 5, 2023, 9:08 PM
21
points
4
comments
1
min read
LW
link
Discursive Competence in ChatGPT, Part 1: Talking with Dragons
Bill Benzon
Jan 5, 2023, 9:01 PM
2
points
0
comments
6
min read
LW
link
Transformative AI issues (not just misalignment): an overview
HoldenKarnofsky
Jan 5, 2023, 8:20 PM
34
points
6
comments
18
min read
LW
link
(www.cold-takes.com)
How to slow down scientific progress, according to Leo Szilard
jasoncrawford
Jan 5, 2023, 6:26 PM
134
points
18
comments
2
min read
LW
link
(rootsofprogress.org)
Paper: Superposition, Memorization, and Double Descent (Anthropic)
LawrenceC
Jan 5, 2023, 5:54 PM
53
points
11
comments
1
min read
LW
link
(transformer-circuits.pub)
Collapse Might Not Be Desirable
Dzoldzaya
Jan 5, 2023, 5:29 PM
−2
points
9
comments
2
min read
LW
link
Singapore—Small casual dinner in Chinatown #6
Joe Rocca
Jan 5, 2023, 5:00 PM
2
points
1
comment
1
min read
LW
link
[Question]
Image generation and alignment
rpglover64
Jan 5, 2023, 4:05 PM
3
points
3
comments
1
min read
LW
link
[Question]
Machine Learning vs Differential Privacy
Ilio
Jan 5, 2023, 3:14 PM
10
points
10
comments
1
min read
LW
link
Covid 1/5/23: Various XBB Takes
Zvi
Jan 5, 2023, 2:20 PM
21
points
18
comments
15
min read
LW
link
(thezvi.wordpress.com)
Running by Default
jefftk
Jan 5, 2023, 1:50 PM
112
points
40
comments
1
min read
LW
link
(www.jefftk.com)
PSA: reward is part of the habit loop too
Alok Singh
Jan 5, 2023, 11:00 AM
22
points
2
comments
1
min read
LW
link
(alok.github.io)
Infohazards vs Fork Hazards
jimrandomh
Jan 5, 2023, 9:45 AM
68
points
16
comments
1
min read
LW
link
Monthly Shorts 12/22
Celer
Jan 5, 2023, 7:20 AM
5
points
2
comments
1
min read
LW
link
(keller.substack.com)
The 2021 Review Phase
Raemon
Jan 5, 2023, 7:12 AM
34
points
7
comments
3
min read
LW
link
Illusion of truth effect and Ambiguity effect: Bias in Evaluating AGI X-Risks
Remmelt
Jan 5, 2023, 4:05 AM
−13
points
2
comments
LW
link
When you plan according to your AI timelines, should you put more weight on the median future, or the median future | eventual AI alignment success? ⚖️
Jeffrey Ladish
Jan 5, 2023, 1:21 AM
25
points
10
comments
2
min read
LW
link
Why I’m joining Anthropic
evhub
Jan 5, 2023, 1:12 AM
118
points
4
comments
2
min read
LW
link
Contra Common Knowledge
abramdemski
Jan 4, 2023, 10:50 PM
52
points
31
comments
16
min read
LW
link
Additional space complexity isn’t always a useful metric
Brendan Long
Jan 4, 2023, 9:53 PM
4
points
3
comments
3
min read
LW
link
(www.brendanlong.com)
List of links for getting into AI safety
zef
Jan 4, 2023, 7:45 PM
6
points
0
comments
1
min read
LW
link
Opening Facebook Links Externally
jefftk
Jan 4, 2023, 7:00 PM
12
points
3
comments
1
min read
LW
link
(www.jefftk.com)
Conversational canyons
Henrik Karlsson
Jan 4, 2023, 6:55 PM
59
points
4
comments
7
min read
LW
link
(escapingflatland.substack.com)
Progress links and tweets, 2023-01-04
jasoncrawford
Jan 4, 2023, 6:23 PM
15
points
0
comments
1
min read
LW
link
(rootsofprogress.org)
200 COP in MI: Analysing Training Dynamics
Neel Nanda
Jan 4, 2023, 4:08 PM
16
points
0
comments
14
min read
LW
link
What’s up with ChatGPT and the Turing Test?
JoshuaFox
and
Zvi Schreiber
Jan 4, 2023, 3:37 PM
13
points
19
comments
3
min read
LW
link
2022 was the year AGI arrived (Just don’t call it that)
Logan Zoellner
Jan 4, 2023, 3:19 PM
101
points
60
comments
3
min read
LW
link
From Simon’s ant to machine learning, a parable
Bill Benzon
Jan 4, 2023, 2:37 PM
6
points
5
comments
2
min read
LW
link
Basic Facts about Language Model Internals
beren
and
Eric Winsor
Jan 4, 2023, 1:01 PM
130
points
19
comments
9
min read
LW
link
Ritual as the only tool for overwriting values and goals
mrcbarbier
Jan 4, 2023, 11:11 AM
41
points
24
comments
32
min read
LW
link
Normalcy bias and Base rate neglect: Bias in Evaluating AGI X-Risks
Remmelt
Jan 4, 2023, 3:16 AM
−16
points
0
comments
LW
link
Causal representation learning as a technique to prevent goal misgeneralization
PabloAMC
Jan 4, 2023, 12:07 AM
21
points
0
comments
8
min read
LW
link
What makes a probability question “well-defined”? (Part II: Bertrand’s Paradox)
Noah Topper
Jan 3, 2023, 10:39 PM
7
points
3
comments
9
min read
LW
link
(naivebayes.substack.com)
“AI” is an indexical
TW123
Jan 3, 2023, 10:00 PM
10
points
0
comments
6
min read
LW
link
(aiwatchtower.substack.com)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel