Quintin Pope comments on Gradient descent is not just more efficient genetic algorithms

Quintin Pope 10 Sep 2021 6:52 UTC
3 points
0
There should be a fair bit more than 2 epsilon of leeway in the line of equality. Since the submodules themselves are learned by SGD, they won’t be exactly equal. Most likely, the model will include dropout as well. Thus, the signals sent to the combining function are almost always more different than the limits of numerical precision allow. This mean the combining function will need quite a bit of leeway, otherwise the network’s performance is just zero always.