StefanHex comments on How To Do Patching Fast

StefanHex 13 May 2024 10:13 UTC
1 point
0
It took me a second to understand why “edge patching” can work with only 1 forward pass. I’m rephrasing my understanding here in case it helps anyone else:
If we path patch node X in layer 1 to node Z in layer 3, then the only way to know what the input to node Z looks like without node X is to actually run a forward pass. Thus we need to run a forward pass for every target node that we want to receive a different set of inputs.
However, if we path patch (edge patch) node X in layer 1 to node Y in layer 2, then we can calculate the new input to node Y “by hand” (without running the model, i.e. cheaply): The input to node Y is just the sum of outputs in the previous layers. So you can skip all the “compute what the input would look like” forward passes.
- Joseph Miller 14 May 2024 17:33 UTC
  1 point
  0
  Parent
  I’m not sure if this is intentional but this explanation implies that edge patching can only be done between nodes in adjacent layers, which is not the case.