Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Hide coronavirus posts
RSS
New
Hot
Active
Old
Page
1
Language models seem to be much better than humans at next-token prediction
Buck
,
Fabien
and
LawrenceC
11 Aug 2022 17:45 UTC
128
points
52
comments
13
min read
LW
link
«Boundaries», Part 1: a key missing concept from utility theory
Andrew_Critch
26 Jul 2022 23:03 UTC
113
points
16
comments
7
min read
LW
link
Changing the world through slack & hobbies
Steven Byrnes
21 Jul 2022 18:11 UTC
228
points
14
comments
10
min read
LW
link
What should you change in response to an “emergency”? And AI risk
AnnaSalamon
18 Jul 2022 1:11 UTC
290
points
59
comments
6
min read
LW
link
Humans provide an untapped wealth of evidence about alignment
TurnTrout
and
Quintin Pope
14 Jul 2022 2:31 UTC
168
points
92
comments
10
min read
LW
link
On how various plans miss the hard bits of the alignment challenge
So8res
12 Jul 2022 2:49 UTC
248
points
81
comments
29
min read
LW
link
ITT-passing and civility are good; “charity” is bad; steelmanning is niche
Rob Bensinger
5 Jul 2022 0:15 UTC
137
points
32
comments
6
min read
LW
link
Looking back on my alignment PhD
TurnTrout
1 Jul 2022 3:19 UTC
283
points
58
comments
11
min read
LW
link
It’s Probably Not Lithium
Natália Coelho Mendonça
28 Jun 2022 21:24 UTC
412
points
179
comments
28
min read
LW
link
What Are You Tracking In Your Head?
johnswentworth
28 Jun 2022 19:30 UTC
215
points
73
comments
4
min read
LW
link
Nonprofit Boards are Weird
HoldenKarnofsky
23 Jun 2022 14:40 UTC
149
points
25
comments
20
min read
LW
link
(www.cold-takes.com)
Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment
elspood
21 Jun 2022 23:55 UTC
312
points
40
comments
7
min read
LW
link
Where I agree and disagree with Eliezer
paulfchristiano
19 Jun 2022 19:15 UTC
738
points
202
comments
20
min read
LW
link
Humans are very reliable agents
alyssavance
16 Jun 2022 22:02 UTC
249
points
35
comments
3
min read
LW
link
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
5 Jun 2022 22:05 UTC
681
points
638
comments
30
min read
LW
link
Public beliefs vs. Private beliefs
Eli Tyre
1 Jun 2022 21:33 UTC
132
points
25
comments
5
min read
LW
link
Six Dimensions of Operational Adequacy in AGI Projects
Eliezer Yudkowsky
30 May 2022 17:00 UTC
263
points
65
comments
13
min read
LW
link
Benign Boundary Violations
Duncan_Sabien
26 May 2022 6:48 UTC
197
points
85
comments
18
min read
LW
link
Visible Homelessness in SF: A Quick Breakdown of Causes
alyssavance
25 May 2022 1:40 UTC
194
points
40
comments
2
min read
LW
link
[Intro to brain-like-AGI safety] 15. Conclusion: Open problems, how to help, AMA
Steven Byrnes
17 May 2022 15:11 UTC
81
points
11
comments
14
min read
LW
link
Back to top
Next