Cleo Nardo comments on Shortform

Cleo Nardo 1 Sep 2025 5:14 UTC
31 points
−2
Replaced with Gradient routing is better than pretraining filtering.
- cloud 2 Sep 2025 6:55 UTC
  2 points
  0
  Parent
  This hypothesis is considered in the original gradient routing paper, which provides evidence for it in a toy setting (section 4.2.2; also, section 4.3 compares gradient routing to data filtering in RL). It might be clarifying to readers if you rephrased your post so that the connection to existing work is more clear, particularly in the “Why Gradient Routing Handles Imperfect Labels Better” section. (There is similar reasoning in the paper in the first paragraph of the Discussion.)
  That said, thanks for raising this point and for the concrete proposal! I think this would be a great experiment. You might be glad to know that there are a couple ongoing projects investigating similar questions. Hopefully they will share results in the next couple months. (Also: you might be interested in the discussions of absorption here.)
  - Cleo Nardo 2 Sep 2025 17:40 UTC
    2 points
    0
    Parent
    Thanks Alex, I should’ve read the paper more closely! I’ve replaced the shortform with a post which includes the results from the paper.
- Lucas Teixeira 1 Sep 2025 7:13 UTC
  2 points
  2
  Parent
  Nit: The title give the impression of a demonstrated result as opposed to a working hypothesis and proposed experiment.
  - Cleo Nardo 1 Sep 2025 7:21 UTC
    3 points
    0
    Parent
    good point, thanks lucas