As a toy example of moving from one algorithm to another, if the network is large enough we can just have the output being a linear combination of the two algorithms and up-regulate one, and down regulate the other:
Sure, but that’s not moving from A to B. That’s pruning from A+B to B. …which, now that I think about is, is effectively just a restatement of the Lotto Ticket Hypothesis[1].
Hm. I wonder if the Lotto Ticket Hypothesis holds for grok’d networks?
Sure, but that’s not moving from A to B. That’s pruning from A+B to B. …which, now that I think about is, is effectively just a restatement of the Lotto Ticket Hypothesis[1].
Hm. I wonder if the Lotto Ticket Hypothesis holds for grok’d networks?
https://arxiv.org/abs/1803.03635v1 etc.