RSS

ryan_greenblatt

Karma: 6,143

I work at Redwood Research.

The case for en­sur­ing that pow­er­ful AIs are controlled

24 Jan 2024 16:11 UTC
245 points
66 comments28 min readLW link

How use­ful is mechanis­tic in­ter­pretabil­ity?

1 Dec 2023 2:54 UTC
155 points
53 comments25 min readLW link

Im­prov­ing the Welfare of AIs: A Nearcasted Proposal

ryan_greenblatt30 Oct 2023 14:51 UTC
87 points
5 comments20 min readLW link

Bench­marks for De­tect­ing Mea­sure­ment Tam­per­ing [Red­wood Re­search]

5 Sep 2023 16:44 UTC
85 points
18 comments20 min readLW link
(arxiv.org)

Catch­ing AIs red-handed

5 Jan 2024 17:43 UTC
82 points
18 comments17 min readLW link

Two prob­lems with ‘Si­mu­la­tors’ as a frame

ryan_greenblatt17 Feb 2023 23:34 UTC
81 points
13 comments5 min readLW link

Prevent­ing model exfil­tra­tion with up­load limits

ryan_greenblatt6 Feb 2024 16:29 UTC
63 points
16 comments14 min readLW link

Manag­ing catas­trophic mi­suse with­out ro­bust AIs

16 Jan 2024 17:27 UTC
58 points
16 comments11 min readLW link

Mea­sure­ment tam­per­ing de­tec­tion as a spe­cial case of weak-to-strong generalization

23 Dec 2023 0:05 UTC
56 points
10 comments4 min readLW link

Au­dit­ing failures vs con­cen­trated failures

11 Dec 2023 2:47 UTC
44 points
0 comments7 min readLW link

Notes on con­trol eval­u­a­tions for safety cases

28 Feb 2024 16:15 UTC
32 points
0 comments32 min readLW link

Large cor­po­ra­tions can unilat­er­ally ban/​tax ran­somware pay­ments via bets

ryan_greenblatt17 Jul 2021 12:56 UTC
26 points
5 comments2 min readLW link

Re­searcher in­cen­tives cause smoother progress on bench­marks

ryan_greenblatt21 Dec 2021 4:13 UTC
20 points
4 comments1 min readLW link

Fram­ing ap­proaches to al­ign­ment and the hard prob­lem of AI cognition

ryan_greenblatt15 Dec 2021 19:06 UTC
16 points
15 comments27 min readLW link

Naive self-su­per­vised ap­proaches to truth­ful AI

ryan_greenblatt23 Oct 2021 13:03 UTC
9 points
4 comments2 min readLW link

[Question] Ques­tions about mul­ti­vi­tam­ins, es­pe­cially manganese

ryan_greenblatt19 Jun 2021 16:09 UTC
7 points
8 comments1 min readLW link

Po­ten­tial gears level ex­pla­na­tions of smooth progress

ryan_greenblatt22 Dec 2021 18:05 UTC
4 points
2 comments2 min readLW link