When I said “the direction is always flat-->sharp”, I meant that their theorems showed you could produce a sharp minimum given a flat one, but not the other way around, sorry if I was unclear.
Definitely agreed that “under which conditions does flatness imply generalization” is a very interesting question. I think this paper has a reasonably satisfying analysis, although I also have some reservations about “SGD as Bayesian sampler” picture.
When I said “the direction is always flat-->sharp”, I meant that their theorems showed you could produce a sharp minimum given a flat one, but not the other way around, sorry if I was unclear.
Definitely agreed that “under which conditions does flatness imply generalization” is a very interesting question. I think this paper has a reasonably satisfying analysis, although I also have some reservations about “SGD as Bayesian sampler” picture.