xuan comments on EIS V: Blind Spots In AI Safety Interpretability Research

xuan 16 Feb 2023 20:51 UTC
LW: 2 AF: 2
−1
AF
Regarding causal scrubbing in particular, it seems to me that there’s a closely related line of research by Geiger, Icard and Potts that it doesn’t seem like TAISIC is engaging with deeply? I haven’t looked too closely, but it may be another example of duplicated effort / rediscovery:
The importance of interventions
Over a series of recent papers (Geiger et al. 2020, Geiger et al. 2021, Geiger et al. 2022, Wu et al. 2022a, Wu et al. 2022b), we have argued that the theory of causal abstraction (Chalupka et al. 2016, Rubinstein et al. 2017, Beckers and Halpern 2019, Beckers et al. 2019) provides a powerful toolkit for achieving the desired kinds of explanation in AI. In causal abstraction, we assess whether a particular high-level (possibly symbolic) mode H is a faithful proxy for a lower-level (in our setting, usually neural) model N in the sense that the causal effects of components in H summarize the causal effects of components of N. In this scenario, N is the AI model that has been deployed to solve a particular task, and H is one’s probably partial, high-level characterization of how the task domain works (or should work). Where this relationship between N and H holds, we say that H is a causal abstraction of N. This means that we can use H to directly engage with high-level questions of robustness, fairness, and safety in deploying N for real-world tasks.
Source: https://ai.stanford.edu/blog/causal-abstraction/
- LawrenceC 16 Feb 2023 21:25 UTC
  LW: 7 AF: 6
  3
  AF Parent
  We were quite familiar with Geiger et al’s work before writing the post, and think it’s importantly different. Though it seems like we forgot to cite it in the Causal Scrubbing AF post, whoops.
  
  Hopefully this will be fixed with the forthcoming arXiv paper!
  - xuan 16 Feb 2023 22:47 UTC
    LW: 2 AF: 2
    0
    AF Parent
    Great to know, and good to hear!

xuan comments on EIS V: Blind Spots In AI Safety Interpretability Research

The importance of interventions