ajskateboarder comments on leogao’s Shortform

ajskateboarder 19 Apr 2026 23:55 UTC
2 points
0
Would you say a similar critique holds for sparse autoencoders?

(edit: i’ve tended to think of SAEs and AOs as basically end-to-end tools for activation-space interpretability, but in hindsight i see AOs are definitely trying to be more “lines go up” and end-to-end than SAEs, even if there are many loss function variants for SAEs. i think i get your point now)
- leogao 20 Apr 2026 2:55 UTC
  4 points
  2
  Parent
  i think SAEs are a completely reasonable thing under the first worldview, and mostly crazy under the second worldview (with the exception of maybe bio or something where I’ve heard they’re genuinely useful)
  (SAEs are not sufficient to actually understand things, but they are a genuine step on the way there)