Some support for the hypothesis that SAE feature instability is caused by the autoencoder tiling a manifold in unique ways. Doesn’t attempt to actually find and describe the manifold, but suggests doing so would be worthwhile.
Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders
Some support for the hypothesis that SAE feature instability is caused by the autoencoder tiling a manifold in unique ways. Doesn’t attempt to actually find and describe the manifold, but suggests doing so would be worthwhile.