Anticorrelated Noise Injection for Improved Generalization

Just a study I saw on /​r/​MachineLearning: link.

Basically, one way of training neural networks is to add random noise during the training. Usually, the noise that gets added is independent between the training steps, but in the paper they make it negatively correlated between the steps, and argue that this helps with the generalization of the networks because it moves them towards flatter minima.

This seems conceptually related to things that have been discussed on LessWrong, e.g. the observations by John Wentworth that search tends to lead to flat minima, which may have beneficial properties.

I would have liked to see them test this on harder problems than the ones they used, and/​or on a greater variety of real-world problems.