I’m not really talking about true information loss, more like the computation getting repeated that doesn’t need to be.
And yes the feedforward blocks can be like 1 or 2 layers deep, so I am open to this being either a small or a big issue, depending on the exact architecture.
I’m not really talking about true information loss, more like the computation getting repeated that doesn’t need to be.
And yes the feedforward blocks can be like 1 or 2 layers deep, so I am open to this being either a small or a big issue, depending on the exact architecture.