Maciej Satkiewicz comments on It turns out that DNNs are remarkably interpretable.

Maciej Satkiewicz 5 Aug 2025 11:01 UTC
1 point
0
Hi, thanks for comment!
By “linear” I mean linear in the feature space, just like kernel machines are considered “linear” under specific data embedding.

Regarding saliency maps, I still think my method can be considered faithful, in fact the whole theoretical toolset I develop serves to argue for the faithfulness of excitation pullbacks, in particular the Hypothesis 1. I argue that the model approximates a kernel machine in the path space exactly to motivate why excitation pullbacks might be faithful, i.e. they reveal the decision boundary of the more regular underlying model, pointing where the gradient noise comes from exactly (in short, I claim that gradients are noisy because they correspond to rank-1 tensors in the feature space, but the network actually learns a higher-rank feature map).
Also notice that I perform just 5 steps of rudimentary gradient ascent in the pixel space with no additional regularisation, immediately achieving very sensible-looking results that are both input- and target-specific. Arguably the highlighted features are exactly those that humans would highlight when asked to accentuate the most salient features predicting given class.