ryan_greenblatt comments on The Engineer’s Interpretability Sequence (EIS) I: Intro

ryan_greenblatt 9 Feb 2023 18:39 UTC
LW: 6 AF: 5
1
AF

We are seeing a number of pushes to get many more people involved in interpretability work

Context: I work at redwood. You linked to REMIX here, but I wouldn’t neccesarily argue for more people doing interpretability on the margin (and I think Buck probably roughly agrees with me here). I think it’s plausible that too much effort is going to interp at the margin. I’m personally far more worried about interpretability work being well directed and high quality than the number of people involved. (It seems like I potentially agree with you on this point based on what this post is implying)

Edit: it seems like you define interpretability very broadly, to the point where I’m a bit confused about what is or isn’t interpretability work. This comment should be interpreted to refer to interpretability as ‘someone (humans or AIs) getting a better understanding how an AI works (often with a mechanistic connotation)’
- Quadratic Reciprocity 4 May 2023 23:41 UTC
  2 points
  0
  Parent
  I think it’s plausible that too much effort is going to interp at the margin
  What’s the counterfactual? Do you think newer people interested in AI safety should be doing other things instead of for example attempting one of the 200+ MI problems suggested by Neel Nanda? What other things?
- scasper 9 Feb 2023 19:46 UTC
  LW: 1 AF: 1
  0
  AF Parent
  Interesting to know that about the plan. I have assumed that remix was in large part about getting more people into this type of work. But I’m interested in the conclusions and current views on it. Is there a post reflecting on how it went and what lessons were learned from it?
  - LawrenceC 9 Feb 2023 20:16 UTC
    LW: 4 AF: 3
    2
    AF Parent
    Don’t think there have been public writeups, but here’s two relevant manifold markets: