Your way of doing it basically approximates the network to first order in the parameter changes/second order in the loss function. That’s the same as the method I’m proposing above really, except you’re changing the features to account for the chain rule acting on the layers in front of them. You’re effectively transforming the network into an equivalent one that has a single linear layer, with the entries of ∇vf(x,Θ) as the features.
That’s fine to do when you’re near a global optimum, the case discussed in the main body of this post, and for tiny changes it’ll hold even generally, but for a broader understanding of the dynamics layer by layer, I think insisting on the transformation to imagespace might not be so productive.
Note that imagespace/=thing that is interpretable. You can recognise a dog head detector fine just by looking at its activations, no need to transpose it into imagespace somehow.
Your way of doing it basically approximates the network to first order in the parameter changes/second order in the loss function. That’s the same as the method I’m proposing above really, except you’re changing the features to account for the chain rule acting on the layers in front of them. You’re effectively transforming the network into an equivalent one that has a single linear layer, with the entries of ∇vf(x,Θ) as the features.
That’s fine to do when you’re near a global optimum, the case discussed in the main body of this post, and for tiny changes it’ll hold even generally, but for a broader understanding of the dynamics layer by layer, I think insisting on the transformation to imagespace might not be so productive.
Note that imagespace/=thing that is interpretable. You can recognise a dog head detector fine just by looking at its activations, no need to transpose it into imagespace somehow.