emanuelr comments on How To Become A Mechanistic Interpretability Researcher

emanuelr 4 Sep 2025 6:18 UTC
1 point
0
Thanks a lot for this article! I have a few questions:
Even after a literature review confirms a research question is unexplored, how can a beginner like me, before running experiments, get a good sense of whether the question is exploring new ground vs. just confirming something that’s already ‘obvious’ or developing a method that isn’t useful? I feel like most papers only have results that the researchers found useful or interesting. Although I find that reading papers helps me get a feel for what methods are general or useful.
Another question is about what “mechanistic” truly means. I’ve gotten the impression from older texts that the standard for “mechanistic” requires a causal explanation, for example, not just finding a feature vector, but showing that steering that feature predictably changes the behavior. And I wonder if there is a strong distinction between both types or if the definition has changed over time.
- Neel Nanda 4 Sep 2025 9:58 UTC
  3 points
  0
  Parent
  This is basically a question of research taste. You’ll get much better at this over time. When starting out, I recommend just not caring and focusing on learning how to do anything at all
  
  The definition for mechanistic has somewhat drifted over time. I personally think of does this use internals and is this rigorous as two separate axes? I care a lot about both and causal interventions is one powerful form of rigour