tailcalled comments on The Pragmascope Idea

tailcalled 2 Sep 2022 9:27 UTC
4 points
0

Compute the small change in data dx which would induce a small change in trained parameter values d\theta along each of the narrowest directions of the ridge in the loss landscape (i.e. eigenvectors of the Hessian with largest eigenvalue).

I’ve been thinking about what results this experiment would yield (have been too lazy to actually perform the experiment myself 😅). You’ve probably already performed the experiment, so my theorizing here probably isn’t useful to you, but I thought I should bring it up anyway, so you can correct my theorizing if wrong/so other people can learn from it.

I believe this dx would immediately bring you “off the data manifold”, perhaps unless the network has been trained to be very robust.

For instance the first eigenvector of the Hessian probably represents the average output of the model, but if e.g. your model is an image classifier and all the images in the dataset have a white background, then rather than just using the network’s built-in bias parameters to control the average output, it could totally decide to just pick a random combination of those white pixels and use them for the intercept. But there’s no reason two different networks are going to use the same combination, since it’s a massively underspecified problem, so this dx won’t generalize to other networks.
- johnswentworth 2 Sep 2022 17:12 UTC
  3 points
  0
  Parent
  I did try it on a simple MNIST classifier. The main result was that all effects were dominated by a handful of missclassified or barely-correctly-classified data points, and the phenomenon I originally hypothesized just wasn’t super relevant.
  Since then, I’ve also tried a different kind of experiment to translate interpretable features across nets, this time on a simple generative model. Basically, the experiment just directly applied the natural abstraction hypothesis to the image-distributions produced by nets trained on the same data (using a first-order approximation). That one worked a lot better, but didn’t really connect to peak breadth or even say much about network internals in general.
  - tailcalled 2 Sep 2022 18:19 UTC
    4 points
    0
    Parent
    I did try it on a simple MNIST classifier. The main result was that all effects were dominated by a handful of missclassified or barely-correctly-classified data points, and the phenomenon I originally hypothesized just wasn’t super relevant.
    Ah, I had been thinking that this method would weight these sorts of data points highly, but I wasn’t sure how critical it would be. I’ve assumed it would be possible to reweight things to focus on a better distribution of data points, because it seems like there would be some very mathematically natural ways of doing this reweighting. Is this something you’ve experimented with?
    … I suppose it may make more sense to do this reweighting for my purposes than for yours.
    Since then, I’ve also tried a different kind of experiment to translate interpretable features across nets, this time on a simple generative model. Basically, the experiment just directly applied the natural abstraction hypothesis to the image-distributions produced by nets trained on the same data (using a first-order approximation).
    When you say “directly applied”, what do you mean?
    That one worked a lot better, but didn’t really connect to peak breadth or even say much about network internals in general.
    Saying much about network internals seems difficult as ever. I get the impression that these methods can’t really do it, due to being too local; they can say something about how the network behaves on the data manifold, but networks that are internally very different can behave the same on the data manifold, and so these methods can’t really distinguish those networks.
    - johnswentworth 21 Nov 2022 22:22 UTC
      2 points
      0
      Parent
      Meta: I’m going through a backlog of comments I never got around to answering. Sorry it took three months.
      I’ve assumed it would be possible to reweight things to focus on a better distribution of data points, because it seems like there would be some very mathematically natural ways of doing this reweighting. Is this something you’ve experimented with?
      Something along those lines might work; I didn’t spend much time on it before moving to a generative model.
      When you say “directly applied”, what do you mean?
      The actual main thing I did was to compute the SVD of the jacobian of a generative network output (i.e. the image) with respect to input (i.e. the latent vector). Results of interest:
      Conceptually, near-0 singular values indicate a direction-in-image-space in which no latent parameter change will move the image—i.e. locally-inaccessible directions. Conversely, large singular values indicate “degrees of freedom” in the image. Relevant result: if I take two different trained generative nets, and find latents for each such that they both output approximately the same image, then they both roughly agree on what directions-in-image-space are local degrees of freedom.
      By taking the SVD of the jacobian of a chunk of the image with respect to the latent, we can figure out which directions-in-latent-space that chunk of image is locally sensitive to. And then, a rough local version of the natural abstraction hypothesis would say that nonadjacent chunks of image should strongly depend on the same small number of directions-in-latent-space, and be “locally independent” (i.e. not highly sensitive to the same directions-in-latent-space) given those few. And that was basically correct.
      To be clear, this was all “rough heuristic testing”, not really testing predictions carefully derived from the natural abstraction framework.