Matthew Barnett comments on Understanding Batch Normalization

Matthew Barnett 1 Aug 2019 20:48 UTC
3 points
Some things I didn’t explain about batch normalization in this post:
Why batch normalization reduces the need for regularization (see section 3.4 in the paper).
New techniques which build on batch normalization (such as layer normalization), and the corresponding limitations of batch normalization.
Things I’m not sure about:
I may have messed up my explanation of why we use the learned parameters $γ$ and $β$ . This was something I didn’t quite understand well. Also, there may be an error in the way I have set up batch normalization step; in particular, I’m unsure whether I am using “input distribution” accurately and consistently.
I might have been a bit unclear for the one dimensional neural network example. If that example doesn’t make sense, try reading the citation from the Deep Learning Book.