It turns out that DNNs are remarkably interpretable.

Link post

I recently posted a paper suggesting that deep networks may harbor an implicitly linear model, recoverable via a form of gradient denoising. The method—called excitation pullback—produces crisp, human-aligned features and offers a structural lens on generalization. Just look at the explanations for ImageNet-pretrained ResNet50 on the front page of the paper.