Neel Nanda comments on Charlie Steiner’s Shortform

Neel Nanda 29 Mar 2024 12:03 UTC
6 points
1
You may be able to notice data points where the SAE performs unusually badly at reconstruction? (Which is what you’d see if there’s a crucial missing feature)
- Erik Jenner 29 Mar 2024 17:12 UTC
  2 points
  0
  Parent
  Would you expect this to outperform doing the same thing with a non-sparse autoencoder (that has a lower latent dimension than the NN’s hidden dimension)? I’m not sure why it would, given that we aren’t using the sparse representations except to map them back (so any type of capacity constraint on the latent space seems fine). If dense autoencoders work just as well for this, they’d probably be more straightforward to train? (unless we already have an SAE lying around from interp anyway, I suppose)
  - Charlie Steiner 29 Mar 2024 21:01 UTC
    2 points
    0
    Parent
    Regular AE’s job is to throw away the information outside some low-dimensional manifold, sparse ~linear AE’s job is to throw away the information not represented by sparse dictionary codes. (Also a low-dimensional manifold, I guess, just made from a different prior.)
    
    If an AE is reconstructing poorly, that means it was throwing away a lot of information. How important that information is seems like a question about which manifold the underlying network “really” generalizes according to. And also what counts as an anomaly / what kinds of outliers you’re even trying to detect.
- Charlie Steiner 29 Mar 2024 16:45 UTC
  2 points
  0
  Parent
  Ah, yeah, that makes sense.