RSS

Red­wood Research

Tag

Causal Scrub­bing: a method for rigor­ously test­ing in­ter­pretabil­ity hy­pothe­ses [Red­wood Re­search]

3 Dec 2022 0:58 UTC
182 points
23 comments20 min readLW link

Ap­ply to the Red­wood Re­search Mechanis­tic In­ter­pretabil­ity Ex­per­i­ment (REMIX), a re­search pro­gram in Berkeley

27 Oct 2022 1:32 UTC
134 points
14 comments12 min readLW link

Red­wood Re­search’s cur­rent project

Buck21 Sep 2021 23:30 UTC
144 points
30 comments15 min readLW link1 review

Red­wood’s Tech­nique-Fo­cused Epistemic Strategy

adamShimi12 Dec 2021 16:36 UTC
48 points
1 comment7 min readLW link

AXRP Epi­sode 17 - Train­ing for Very High Reli­a­bil­ity with Daniel Ziegler

DanielFilan21 Aug 2022 23:50 UTC
16 points
0 comments34 min readLW link

Take­aways from our ro­bust in­jury clas­sifier pro­ject [Red­wood Re­search]

dmz17 Sep 2022 3:55 UTC
137 points
10 comments6 min readLW link

Why I’m ex­cited about Red­wood Re­search’s cur­rent project

paulfchristiano12 Nov 2021 19:26 UTC
112 points
6 comments7 min readLW link

High-stakes al­ign­ment via ad­ver­sar­ial train­ing [Red­wood Re­search re­port]

5 May 2022 0:59 UTC
142 points
29 comments9 min readLW link

We’re Red­wood Re­search, we do ap­plied al­ign­ment re­search, AMA

Nate Thomas6 Oct 2021 5:51 UTC
56 points
3 comments2 min readLW link
(forum.effectivealtruism.org)

Red­wood Re­search is hiring for sev­eral roles

29 Nov 2021 0:16 UTC
44 points
0 comments1 min readLW link

Red­wood Re­search is hiring for sev­eral roles (Oper­a­tions and Tech­ni­cal)

14 Apr 2022 16:57 UTC
29 points
0 comments1 min readLW link

Help out Red­wood Re­search’s in­ter­pretabil­ity team by find­ing heuris­tics im­ple­mented by GPT-2 small

12 Oct 2022 21:25 UTC
50 points
11 comments4 min readLW link

Some Les­sons Learned from Study­ing Indi­rect Ob­ject Iden­ti­fi­ca­tion in GPT-2 small

28 Oct 2022 23:55 UTC
95 points
7 comments9 min readLW link
(arxiv.org)

Causal scrub­bing: re­sults on a paren bal­ance checker

3 Dec 2022 0:59 UTC
33 points
2 comments30 min readLW link

Causal scrub­bing: re­sults on in­duc­tion heads

3 Dec 2022 0:59 UTC
34 points
0 comments17 min readLW link

Causal scrub­bing: Appendix

3 Dec 2022 0:58 UTC
17 points
4 comments20 min readLW link

Prac­ti­cal Pit­falls of Causal Scrubbing

27 Mar 2023 7:47 UTC
56 points
11 comments13 min readLW link
No comments.