You may be able to notice data points where the SAE performs unusually badly at reconstruction? (Which is what you’d see if there’s a crucial missing feature)
Would you expect this to outperform doing the same thing with a non-sparse autoencoder (that has a lower latent dimension than the NN’s hidden dimension)? I’m not sure why it would, given that we aren’t using the sparse representations except to map them back (so any type of capacity constraint on the latent space seems fine). If dense autoencoders work just as well for this, they’d probably be more straightforward to train? (unless we already have an SAE lying around from interp anyway, I suppose)
Regular AE’s job is to throw away the information outside some low-dimensional manifold, sparse ~linear AE’s job is to throw away the information not represented by sparse dictionary codes. (Also a low-dimensional manifold, I guess, just made from a different prior.)
If an AE is reconstructing poorly, that means it was throwing away a lot of information. How important that information is seems like a question about which manifold the underlying network “really” generalizes according to. And also what counts as an anomaly / what kinds of outliers you’re even trying to detect.
You may be able to notice data points where the SAE performs unusually badly at reconstruction? (Which is what you’d see if there’s a crucial missing feature)
Would you expect this to outperform doing the same thing with a non-sparse autoencoder (that has a lower latent dimension than the NN’s hidden dimension)? I’m not sure why it would, given that we aren’t using the sparse representations except to map them back (so any type of capacity constraint on the latent space seems fine). If dense autoencoders work just as well for this, they’d probably be more straightforward to train? (unless we already have an SAE lying around from interp anyway, I suppose)
Regular AE’s job is to throw away the information outside some low-dimensional manifold, sparse ~linear AE’s job is to throw away the information not represented by sparse dictionary codes. (Also a low-dimensional manifold, I guess, just made from a different prior.)
If an AE is reconstructing poorly, that means it was throwing away a lot of information. How important that information is seems like a question about which manifold the underlying network “really” generalizes according to. And also what counts as an anomaly / what kinds of outliers you’re even trying to detect.
Ah, yeah, that makes sense.