We are seeing a number of pushes to get many more people involved in interpretability work
Context: I work at redwood.
You linked to REMIX here, but I wouldn’t neccesarily argue for more people doing interpretability on the margin (and I think Buck probably roughly agrees with me here).
I think it’s plausible that too much effort is going to interp at the margin.
I’m personally far more worried about interpretability work being well directed and high quality than the number of people involved. (It seems like I potentially agree with you on this point based on what this post is implying)
Edit: it seems like you define interpretability very broadly, to the point where I’m a bit confused about what is or isn’t interpretability work. This comment should be interpreted to refer to interpretability as ‘someone (humans or AIs) getting a better understanding how an AI works (often with a mechanistic connotation)’
I think it’s plausible that too much effort is going to interp at the margin
What’s the counterfactual? Do you think newer people interested in AI safety should be doing other things instead of for example attempting one of the 200+ MI problems suggested by Neel Nanda? What other things?
Interesting to know that about the plan. I have assumed that remix was in large part about getting more people into this type of work. But I’m interested in the conclusions and current views on it. Is there a post reflecting on how it went and what lessons were learned from it?
Context: I work at redwood. You linked to REMIX here, but I wouldn’t neccesarily argue for more people doing interpretability on the margin (and I think Buck probably roughly agrees with me here). I think it’s plausible that too much effort is going to interp at the margin. I’m personally far more worried about interpretability work being well directed and high quality than the number of people involved. (It seems like I potentially agree with you on this point based on what this post is implying)
Edit: it seems like you define interpretability very broadly, to the point where I’m a bit confused about what is or isn’t interpretability work. This comment should be interpreted to refer to interpretability as ‘someone (humans or AIs) getting a better understanding how an AI works (often with a mechanistic connotation)’
What’s the counterfactual? Do you think newer people interested in AI safety should be doing other things instead of for example attempting one of the 200+ MI problems suggested by Neel Nanda? What other things?
Interesting to know that about the plan. I have assumed that remix was in large part about getting more people into this type of work. But I’m interested in the conclusions and current views on it. Is there a post reflecting on how it went and what lessons were learned from it?
Don’t think there have been public writeups, but here’s two relevant manifold markets: