Great points. I definitely agree with your argument quantitatively: these distinctions mean that a probabilistic model will be quantitatively more interpretable for the same system, or be able to handle more complex systems for a given interpretability metric (like e.g. “running into catastrophic misalignment”).
That said, it does seem like the vast majority of interpretability for both probabilistic and ML systems is in “how does this internal stuff correspond to stuff in the world”. So qualitatively, it seems like the central interpretability problem is basically the same for both.
Yeah, I agree that if you learn a probabilistic model then you mostly have a difference in degree rather than difference in kind with respect to interpretability. It’s not super clear that the difference in degree is large or important (it seems like it could be, just not clear). And if you aren’t willing to learn a probabilistic model, then you are handicapping your system in a way that will probably eventually be a big deal.
Great points. I definitely agree with your argument quantitatively: these distinctions mean that a probabilistic model will be quantitatively more interpretable for the same system, or be able to handle more complex systems for a given interpretability metric (like e.g. “running into catastrophic misalignment”).
That said, it does seem like the vast majority of interpretability for both probabilistic and ML systems is in “how does this internal stuff correspond to stuff in the world”. So qualitatively, it seems like the central interpretability problem is basically the same for both.
Yeah, I agree that if you learn a probabilistic model then you mostly have a difference in degree rather than difference in kind with respect to interpretability. It’s not super clear that the difference in degree is large or important (it seems like it could be, just not clear). And if you aren’t willing to learn a probabilistic model, then you are handicapping your system in a way that will probably eventually be a big deal.