This approach also makes lots of regularisation techniques transparent. Typically regularisation corresponds to applying some prior (over the weights/​parameters of the model you’re fitting). e.g. L2 norm aka ridge aka weight decay regularisation corresponds exactly to taking a Gaussian prior on the weights and finding the Maximum A Priori estimate (rather than the Maximum Likelihood).
This approach also makes lots of regularisation techniques transparent. Typically regularisation corresponds to applying some prior (over the weights/​parameters of the model you’re fitting). e.g. L2 norm aka ridge aka weight decay regularisation corresponds exactly to taking a Gaussian prior on the weights and finding the Maximum A Priori estimate (rather than the Maximum Likelihood).