Adam Karvonen comments on SAEBench: A Comprehensive Benchmark for Sparse Autoencoders

Adam Karvonen 24 Jan 2025 16:17 UTC
7 points
0
A $1 training run would be training 6 SAEs across 6 sparsities at 16K width on Gemma-2-2B for 200M tokens. This includes generating the activations, and it would be cheaper if the activations are precomputed. In practice this seems like large enough scale to validate ideas such as the Matryoshka SAE or the BatchTopK SAE.
- Neel Nanda 25 Jan 2025 12:20 UTC
  5 points
  0
  Parent
  Yeah, if you’re doing this, you should definitely pre compute and save activations