Nora_Ammann comments on Gradient Routing: Masking Gradients to Localize Computation in Neural Networks

Nora_Ammann 11 Sep 2025 8:08 UTC
LW: 2 AF: 1
0
AF
Sorry if I missed it—do you have any experimental results on how much gradient routing degrades task performance compared to normal training?
- cloud 12 Sep 2025 6:55 UTC
  4 points
  0
  Parent
  Figure 4 in the paper shows the performance of gradient routing in a toyish setting (a small LM trained on synthetic children’s stories). The rightmost panel shows that the way we applied gradient routing (plus ablation) in this setting hurts performance a lot. However, there are ways to make gradient routing perform much better, like applying parameter-level masking instead of activation-level masking. These are the subject of ongoing work.