DanielFilan comments on [AN #147]: An overview of the interpretability landscape

DanielFilan 21 Apr 2021 18:03 UTC
LW: 8 AF: 3
0
AF
Notes:
- I see our results on paired images as less conclusive than the summary implies. From the paper:
Networks trained on halves-diff [i.e. paired-image] datasets are more relatively clusterable than those trained on halves-same [‘pairs’ of the same thing, which we used as a control] datasets, but not more absolutely clusterable… networks trained on stack-diff [paired] datasets are somewhat more clusterable, both in absolute and relative terms, than those trained on stack-same [control] datasets.
- Appendix A.5 gives results for training MLPs on noise images with random labels. When the MLP is able to memorize the data, it’s as clusterable as when trained on MNIST, but when it can’t memorize, it’s not particularly clusterable.
- “Another challenge is whether networks are more modular just because in a bigger model there are more chances to find good cuts. (In other words, what’s the default to which we should be comparing?)”—the answer is basically it depends. In the paper, we present ‘absolute’ clusterability numbers, as well as ‘relative’ statistics that indicate how clusterable the network is relative to versions of the networks where the weights are randomly shuffled. So relative clusterability answers this question. The ResNets are much more relatively clusterable than normal nets, meaning that their clusterability advantage isn’t just because of the architecture, but other big CNNs are relatively clusterable compared to their shuffles but not more so than most small CNNs, meaning their clusterability advantage over small nets is likely due to their aspect ratio (if nets are deep, you can ‘cluster’ by layer and be very clusterable). But TBC, the clusterability of non-ResNet big CNNs is not totally due to their architecture, their weights are still more clusterable than if you randomly permute them. (I think I explained this terribly, so do please ask clarifying questions)
- DanielFilan 21 Apr 2021 18:04 UTC
  LW: 8 AF: 3
  0
  AF Parent
  
  Additionally, in an intuitive sense, pruning a network seems as though it could be defined in terms of clusterability notions, which limits my enthusiasm for that result.
  
  I see what you mean, but there exist things called expander graphs which are very sparse (i.e. very pruned) but minimally clusterable. Now, these don’t have a topology compatible with being a neural network, but are proofs of concept that you can prune without being clusterable. For more evidence, note that our pruned networks are more clusterable than if you permuted the weights randomly—that is, than random pruned networks.