Jose Sepulveda comments on Learning Multi-Level Features with Matryoshka SAEs

Jose Sepulveda 22 Dec 2024 16:26 UTC
1 point
0
Looking at your code I see you still add an L1 penalty to the loss, is this still necessary? In my own experiments I’ve noticed that top-k is able to achieve sparsity on it’s own without the need for L1.
- Bart Bussmann 23 Dec 2024 8:34 UTC
  1 point
  1
  Parent
  Although the code has the option to add a L1-penalty, in practice I set the l1_coeff to 0 in all my experiments (see main.py for all hyperparameters).
  - Jose Sepulveda 23 Dec 2024 23:37 UTC
    1 point
    0
    Parent
    Oh I see that, thanks! :) Super interesting work. I’m testing it’s application to recommender systems.