Daniel Tan comments on Daniel Tan’s Shortform

Daniel Tan 2 Jan 2025 0:14 UTC
1 point
0
Related idea: If we can show that models themselves learn featyres of different granularity, we could then test whether SAEs reflect this difference. (I expect they do not.) This would imply that SAEs capture properties of the data rather than the model.