TLW comments on Hypothesis: gradient descent prefers general circuits

TLW 15 Feb 2022 3:06 UTC
2 points
0
As a toy example of moving from one algorithm to another, if the network is large enough we can just have the output being a linear combination of the two algorithms and up-regulate one, and down regulate the other:
Sure, but that’s not moving from A to B. That’s pruning from A+B to B. …which, now that I think about is, is effectively just a restatement of the Lotto Ticket Hypothesis^[1].
Hm. I wonder if the Lotto Ticket Hypothesis holds for grok’d networks?
1. ^
  https://arxiv.org/abs/1803.03635v1 etc.