Redwood Research

Tag

Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research]

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:58 UTC

197 points

35 comments20 min readLW link 1 review

Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau, Xander Davies, Buck and Nate Thomas

27 Oct 2022 1:32 UTC

135 points

14 comments12 min readLW link

Benchmarks for Detecting Measurement Tampering [Redwood Research]

ryan_greenblatt and Fabien Roger

5 Sep 2023 16:44 UTC

86 points

19 comments20 min readLW link

(arxiv.org)

Takeaways from our robust injury classifier project [Redwood Research]

dmz17 Sep 2022 3:55 UTC

143 points

12 comments6 min readLW link 1 review

Redwood’s Technique-Focused Epistemic Strategy

adamShimi12 Dec 2021 16:36 UTC

48 points

1 comment7 min readLW link

Redwood Research’s current project

Buck21 Sep 2021 23:30 UTC

145 points

29 comments15 min readLW link 1 review

AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

DanielFilan21 Aug 2022 23:50 UTC

16 points

0 comments35 min readLW link

High-stakes alignment via adversarial training [Redwood Research report]

dmz, LawrenceC and Nate Thomas

5 May 2022 0:59 UTC

142 points

29 comments9 min readLW link

[Linkpost] Critiques of Redwood Research

Akash31 Mar 2023 20:00 UTC

13 points

2 comments1 min readLW link

(forum.effectivealtruism.org)

Why I’m excited about Redwood Research’s current project

paulfchristiano12 Nov 2021 19:26 UTC

114 points

6 comments7 min readLW link

Some common confusion about induction heads

Alexandre Variengien28 Mar 2023 21:51 UTC

61 points

4 comments5 min readLW link

Redwood Research is hiring for several roles (Operations and Technical)

Jessica W and billzito

14 Apr 2022 16:57 UTC

29 points

0 comments1 min readLW link

Help out Redwood Research’s interpretability team by finding heuristics implemented by GPT-2 small

Haoxing Du and Buck

12 Oct 2022 21:25 UTC

50 points

11 comments4 min readLW link

Redwood Research is hiring for several roles

Jack R and billzito

29 Nov 2021 0:16 UTC

44 points

0 comments1 min readLW link

Some Lessons Learned from Studying Indirect Object Identification in GPT-2 small

KevinRoWang, Alexandre Variengien, Arthur Conmy, Buck and jsteinhardt

28 Oct 2022 23:55 UTC

99 points

9 comments9 min readLW link 2 reviews

(arxiv.org)

Causal scrubbing: results on a paren balance checker

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, Tao Lin, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:59 UTC

34 points

2 comments30 min readLW link

Causal scrubbing: results on induction heads

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, Tao Lin, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:59 UTC

34 points

1 comment17 min readLW link

Causal scrubbing: Appendix

LawrenceC, Adrià Garriga-alonso, Nicholas Goldowsky-Dill, ryan_greenblatt, jenny, Ansh Radhakrishnan, Buck and Nate Thomas

3 Dec 2022 0:58 UTC

17 points

4 comments20 min readLW link

Practical Pitfalls of Causal Scrubbing

Jérémy Scheurer, Phil3, tony, jacquesthibs and David Lindner

27 Mar 2023 7:47 UTC

87 points

17 comments13 min readLW link

We’re Redwood Research, we do applied alignment research, AMA

Nate Thomas6 Oct 2021 5:51 UTC

56 points

2 comments2 min readLW link

(forum.effectivealtruism.org)

Critiques of prominent AI safety labs: Redwood Research

Omega.17 Apr 2023 18:20 UTC

1 point

0 comments22 min readLW link

(forum.effectivealtruism.org)

No comments.

Red­wood Research

Redwood Research