Archive
Sequences
About
Search
Log In
Home
Featured
All
Tags
Recent
Comments
Questions
Events
Shortform
Alignment Forum
AF Comments
Hide coronavirus posts
RSS
New
Hot
Active
Old
Page
1
Formal Philosophy and Alignment Possible Projects
Whispermute
30 Jun 2022 10:42 UTC
31
points
5
comments
8
min read
LW
link
What Is The True Name of Modularity?
TheMcDouglas
,
Lblack
and
Avery
1 Jul 2022 14:55 UTC
19
points
3
comments
12
min read
LW
link
Call For Distillers
johnswentworth
4 Apr 2022 18:25 UTC
186
points
36
comments
3
min read
LW
link
[Linkpost] Existential Risk Analysis in Empirical Research Papers
Dan Hendrycks
2 Jul 2022 0:09 UTC
29
points
0
comments
1
min read
LW
link
(arxiv.org)
Announcing the Inverse Scaling Prize ($250k Prize Pool)
Ethan Perez
,
irmckenzie
and
Sam Bowman
27 Jun 2022 15:58 UTC
157
points
12
comments
7
min read
LW
link
AXRP Episode 16 - Preparing for Debate AI with Geoffrey Irving
DanielFilan
1 Jul 2022 22:20 UTC
11
points
0
comments
37
min read
LW
link
Latent Adversarial Training
Adam Jermyn
29 Jun 2022 20:04 UTC
18
points
3
comments
5
min read
LW
link
Exploring Mild Behaviour in Embedded Agents
Megan Kinniment
27 Jun 2022 18:56 UTC
19
points
3
comments
18
min read
LW
link
A descriptive, not prescriptive, overview of current AI Alignment Research
Jan
,
Logan Riggs
,
jacquesthibs
and
janus
6 Jun 2022 21:59 UTC
94
points
17
comments
7
min read
LW
link
Will Capabilities Generalise More?
Ramana Kumar
29 Jun 2022 17:12 UTC
52
points
10
comments
4
min read
LW
link
Utility Maximization = Description Length Minimization
johnswentworth
18 Feb 2021 18:04 UTC
166
points
38
comments
5
min read
LW
link
[Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA
Steven Byrnes
17 May 2022 15:11 UTC
70
points
10
comments
14
min read
LW
link
Why Subagents?
johnswentworth
1 Aug 2019 22:17 UTC
154
points
38
comments
7
min read
LW
link
1
review
Where I agree and disagree with Eliezer
paulfchristiano
19 Jun 2022 19:15 UTC
684
points
191
comments
20
min read
LW
link
The Big Picture Of Alignment (Talk Part 2)
johnswentworth
25 Feb 2022 2:53 UTC
32
points
12
comments
1
min read
LW
link
(www.youtube.com)
Open Problems in Negative Side Effect Minimization
Fabian Schimpf
and
Lukas Fluri
6 May 2022 9:37 UTC
12
points
4
comments
17
min read
LW
link
A central AI alignment problem: capabilities generalization, and the sharp left turn
So8res
15 Jun 2022 13:10 UTC
200
points
36
comments
10
min read
LW
link
The Case for a Journal of AI Alignment
adamShimi
9 Jan 2021 18:13 UTC
45
points
31
comments
4
min read
LW
link
Optimality is the tiger, and agents are its teeth
Veedrac
2 Apr 2022 0:46 UTC
172
points
28
comments
16
min read
LW
link
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
5 Jun 2022 22:05 UTC
666
points
629
comments
30
min read
LW
link
Back to top
Next