Archive
Sequences
About
Search
Log In
Home
Featured
All
Tags
Recent
Comments
Questions
Events
Shortform
Alignment Forum
AF Comments
Hide coronavirus posts
RSS
Page
1
Superrational Agents Kelly Bet Influence!
abramdemski
16 Apr 2021 22:08 UTC
34
points
4
comments
5
min read
LW
link
Computing Natural Abstractions: Linear Approximation
johnswentworth
15 Apr 2021 17:47 UTC
33
points
22
comments
7
min read
LW
link
[AN #146]: Plausible stories of how we might fail to avert an existential catastrophe
rohinmshah
14 Apr 2021 17:30 UTC
15
points
1
comment
8
min read
LW
link
(mailchi.mp)
Intermittent Distillations #2
Mark Xu
14 Apr 2021 6:47 UTC
23
points
4
comments
9
min read
LW
link
Identifiability Problem for Superrational Decision Theories
Bunthut
9 Apr 2021 20:33 UTC
17
points
13
comments
2
min read
LW
link
Opinions on Interpretable Machine Learning and 70 Summaries of Recent Papers
lifelonglearner
and
Peter Hase
9 Apr 2021 19:19 UTC
109
points
10
comments
102
min read
LW
link
My Current Take on Counterfactuals
abramdemski
9 Apr 2021 17:51 UTC
49
points
13
comments
24
min read
LW
link
[AN #145]: Our three year anniversary!
rohinmshah
9 Apr 2021 17:48 UTC
19
points
0
comments
8
min read
LW
link
(mailchi.mp)
Why unriggable *almost* implies uninfluenceable
Stuart_Armstrong
9 Apr 2021 17:07 UTC
11
points
0
comments
4
min read
LW
link
AXRP Episode 6 - Debate and Imitative Generalization with Beth Barnes
DanielFilan
8 Apr 2021 21:20 UTC
23
points
3
comments
59
min read
LW
link
A possible preference algorithm
Stuart_Armstrong
8 Apr 2021 18:25 UTC
22
points
0
comments
4
min read
LW
link
If you don’t design for extrapolation, you’ll extrapolate poorly—possibly fatally
Stuart_Armstrong
8 Apr 2021 18:10 UTC
17
points
0
comments
4
min read
LW
link
Solving the whole AGI control problem, version 0.0001
Steven Byrnes
8 Apr 2021 15:14 UTC
41
points
4
comments
26
min read
LW
link
Another (outer) alignment failure story
paulfchristiano
7 Apr 2021 20:12 UTC
127
points
19
comments
12
min read
LW
link
Which counterfactuals should an AI follow?
Stuart_Armstrong
7 Apr 2021 16:47 UTC
19
points
5
comments
7
min read
LW
link
Alignment Newsletter Three Year Retrospective
rohinmshah
7 Apr 2021 14:39 UTC
54
points
0
comments
5
min read
LW
link
Testing The Natural Abstraction Hypothesis: Project Intro
johnswentworth
6 Apr 2021 21:24 UTC
106
points
15
comments
6
min read
LW
link
Reflective Bayesianism
abramdemski
6 Apr 2021 19:48 UTC
48
points
27
comments
13
min read
LW
link
The Many Faces of Infra-Beliefs
Diffractor
6 Apr 2021 10:43 UTC
15
points
0
comments
63
min read
LW
link
[Question]
How do scaling laws work for fine-tuning?
Daniel Kokotajlo
4 Apr 2021 12:18 UTC
24
points
10
comments
1
min read
LW
link
Back to top
Next