cloud comments on Gradient Routing: Masking Gradients to Localize Computation in Neural Networks

cloud 12 Sep 2025 6:55 UTC
4 points
0
Figure 4 in the paper shows the performance of gradient routing in a toyish setting (a small LM trained on synthetic children’s stories). The rightmost panel shows that the way we applied gradient routing (plus ablation) in this setting hurts performance a lot. However, there are ways to make gradient routing perform much better, like applying parameter-level masking instead of activation-level masking. These are the subject of ongoing work.