Why not both? I imagine you could average the gradients so that you learn both at the same time.
It definitely could be. This isn’t the sense I get but I’m happy to be proven wrong here
Why not both? I imagine you could average the gradients so that you learn both at the same time.
It definitely could be. This isn’t the sense I get but I’m happy to be proven wrong here