Hmm, this might come down to how independent the parameters are. I think in large networks there will generally be enough independence between the parameters for local minima to be rare (although possible).
I can see how the initial parameters are independent. After a significant amount of training though...?
I can see how the initial parameters are independent. After a significant amount of training though...?