I’m mapping out how people are entering mechanistic interpretability and to me it seems like there isn’t a single agreed upon route. Some people begin with reproducing classic experiments, some come through theory or causal ML, others build tools, or jump in through bio/RL/vision.
I would appreciate stories that tell:
How did you start?
What worked?
What didn’t?
What would you do differently if starting today?
Are there alternative routes you’ve seen that people underestimate?
All perspectives are welcomed- even partial experiences
[Question] How did you get started in mechanistic interpretability? What other paths have you seen work?
I’m mapping out how people are entering mechanistic interpretability and to me it seems like there isn’t a single agreed upon route. Some people begin with reproducing classic experiments, some come through theory or causal ML, others build tools, or jump in through bio/RL/vision.
I would appreciate stories that tell:
How did you start?
What worked?
What didn’t?
What would you do differently if starting today?
Are there alternative routes you’ve seen that people underestimate?
All perspectives are welcomed- even partial experiences