I don’t see how this is a success at doing something useful on a real task.
Because I don’t think this is realistically useful, I don’t think this at all reduces my probability that your techniques are fake and your models of interpretability are wrong.
Maybe the groundedness you’re talking about comes from the fact that you’re doing interp on a domain of practical importance? I agree that doing things on a domain of practical importance might make it easier to be grounded. But it mostly seems like it would be helpful because it gives you well-tuned baselines to compare your results to. I don’t think you have results that can cleanly be compared to well-established baselines?
(Tbc I don’t think this work is particularly more ungrounded/sloppy than other interp, having not engaged with it much, I’m just not sure why you’re referring to groundedness as a particular strength. I could very well be wrong here.)
Lawrence, how are these results any more grounded than any other interp work?