Gurkenglas comments on Gradient routing is better than pretraining filtering