Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
Book Review: How Minds Change
bc4026bd4aaa5b7fe
25 May 2023 17:55 UTC
221
points
29
comments
15
min read
LW
link
Focus on the places where you feel shocked everyone’s dropping the ball
So8res
2 Feb 2023 0:27 UTC
380
points
58
comments
4
min read
LW
link
Steering GPT-2-XL by adding an activation vector
TurnTrout
,
Monte M
,
David Udell
,
lisathiergart
and
Ulisse Mini
13 May 2023 18:42 UTC
378
points
73
comments
50
min read
LW
link
How to have Polygenically Screened Children
GeneSmith
7 May 2023 16:01 UTC
280
points
67
comments
29
min read
LW
link
Security Mindset: Lessons from 20+ years of Software Security Failures Relevant to AGI Alignment
elspood
21 Jun 2022 23:55 UTC
341
points
41
comments
7
min read
LW
link
Study Guide
johnswentworth
6 Nov 2021 1:23 UTC
257
points
44
comments
16
min read
LW
link
Why I think strong general AI is coming soon
porby
28 Sep 2022 5:40 UTC
312
points
138
comments
34
min read
LW
link
Can we efficiently distinguish different mechanisms?
paulfchristiano
27 Dec 2022 0:20 UTC
86
points
30
comments
16
min read
LW
link
(ai-alignment.com)
(My understanding of) What Everyone in Technical Alignment is Doing and Why
Thomas Larsen
and
elifland
29 Aug 2022 1:23 UTC
388
points
87
comments
38
min read
LW
link
AGI Ruin: A List of Lethalities
Eliezer Yudkowsky
5 Jun 2022 22:05 UTC
839
points
670
comments
30
min read
LW
link
Lies Told To Children
Eliezer Yudkowsky
14 Apr 2022 11:25 UTC
348
points
89
comments
7
min read
LW
link
More information about the dangerous capability evaluations we did with GPT-4 and Claude.
Beth Barnes
19 Mar 2023 0:25 UTC
233
points
52
comments
8
min read
LW
link
(evals.alignment.org)
Science in a High-Dimensional World
johnswentworth
8 Jan 2021 17:52 UTC
269
points
53
comments
7
min read
LW
link
1
review
When is Goodhart catastrophic?
Drake Thomas
and
Thomas Kwa
9 May 2023 3:59 UTC
145
points
18
comments
8
min read
LW
link
“Other people are wrong” vs “I am right”
Buck
22 Feb 2019 20:01 UTC
241
points
20
comments
9
min read
LW
link
2
reviews
Rereading Atlas Shrugged
Vaniver
28 Jul 2020 18:54 UTC
153
points
28
comments
13
min read
LW
link
1
review
Predictable updating about AI risk
Joe Carlsmith
8 May 2023 21:53 UTC
272
points
20
comments
36
min read
LW
link
Real-Life Examples of Prediction Systems Interfering with the Real World (Predict-O-Matic Problems)
NunoSempere
3 Dec 2020 22:00 UTC
122
points
28
comments
9
min read
LW
link
What 2026 looks like
Daniel Kokotajlo
6 Aug 2021 16:14 UTC
414
points
146
comments
16
min read
LW
link
1
review
The shard theory of human values
Quintin Pope
and
TurnTrout
4 Sep 2022 4:28 UTC
232
points
61
comments
24
min read
LW
link
Back to top
Next