Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
ryan_greenblatt
Karma:
6,143
I work at Redwood Research.
All
Posts
Comments
New
Top
Old
The case for ensuring that powerful AIs are controlled
ryan_greenblatt
and
Buck
24 Jan 2024 16:11 UTC
245
points
66
comments
28
min read
LW
link
How useful is mechanistic interpretability?
ryan_greenblatt
,
Neel Nanda
,
Buck
and
habryka
1 Dec 2023 2:54 UTC
155
points
53
comments
25
min read
LW
link
Improving the Welfare of AIs: A Nearcasted Proposal
ryan_greenblatt
30 Oct 2023 14:51 UTC
87
points
5
comments
20
min read
LW
link
Benchmarks for Detecting Measurement Tampering [Redwood Research]
ryan_greenblatt
and
Fabien Roger
5 Sep 2023 16:44 UTC
85
points
18
comments
20
min read
LW
link
(arxiv.org)
Catching AIs red-handed
ryan_greenblatt
and
Buck
5 Jan 2024 17:43 UTC
82
points
18
comments
17
min read
LW
link
Two problems with ‘Simulators’ as a frame
ryan_greenblatt
17 Feb 2023 23:34 UTC
81
points
13
comments
5
min read
LW
link
Preventing model exfiltration with upload limits
ryan_greenblatt
6 Feb 2024 16:29 UTC
63
points
16
comments
14
min read
LW
link
Managing catastrophic misuse without robust AIs
ryan_greenblatt
and
Buck
16 Jan 2024 17:27 UTC
58
points
16
comments
11
min read
LW
link
Measurement tampering detection as a special case of weak-to-strong generalization
ryan_greenblatt
,
Fabien Roger
and
Buck
23 Dec 2023 0:05 UTC
56
points
10
comments
4
min read
LW
link
Auditing failures vs concentrated failures
ryan_greenblatt
and
Fabien Roger
11 Dec 2023 2:47 UTC
44
points
0
comments
7
min read
LW
link
Notes on control evaluations for safety cases
ryan_greenblatt
,
Buck
and
Fabien Roger
28 Feb 2024 16:15 UTC
32
points
0
comments
32
min read
LW
link
Large corporations can unilaterally ban/tax ransomware payments via bets
ryan_greenblatt
17 Jul 2021 12:56 UTC
26
points
5
comments
2
min read
LW
link
Researcher incentives cause smoother progress on benchmarks
ryan_greenblatt
21 Dec 2021 4:13 UTC
20
points
4
comments
1
min read
LW
link
Framing approaches to alignment and the hard problem of AI cognition
ryan_greenblatt
15 Dec 2021 19:06 UTC
16
points
15
comments
27
min read
LW
link
Naive self-supervised approaches to truthful AI
ryan_greenblatt
23 Oct 2021 13:03 UTC
9
points
4
comments
2
min read
LW
link
[Question]
Questions about multivitamins, especially manganese
ryan_greenblatt
19 Jun 2021 16:09 UTC
7
points
8
comments
1
min read
LW
link
Potential gears level explanations of smooth progress
ryan_greenblatt
22 Dec 2021 18:05 UTC
4
points
2
comments
2
min read
LW
link
Back to top