johnswentworth comments on Deep Learning Systems Are Not Less Interpretable Than Logic/Probability/Etc

johnswentworth 4 Jun 2022 20:48 UTC
LW: 5 AF: 3
0
AF
Great points. I definitely agree with your argument quantitatively: these distinctions mean that a probabilistic model will be quantitatively more interpretable for the same system, or be able to handle more complex systems for a given interpretability metric (like e.g. “running into catastrophic misalignment”).
That said, it does seem like the vast majority of interpretability for both probabilistic and ML systems is in “how does this internal stuff correspond to stuff in the world”. So qualitatively, it seems like the central interpretability problem is basically the same for both.
- paulfchristiano 5 Jun 2022 0:40 UTC
  LW: 8 AF: 5
  0
  AF Parent
  Yeah, I agree that if you learn a probabilistic model then you mostly have a difference in degree rather than difference in kind with respect to interpretability. It’s not super clear that the difference in degree is large or important (it seems like it could be, just not clear). And if you aren’t willing to learn a probabilistic model, then you are handicapping your system in a way that will probably eventually be a big deal.