Thanks for your reply. It’s a fascinating topic and I’ve got lots of follow-up questions but I’ll read the paper and book first to get a better idea of which questions have already been addressed.
(edit 2 days later): Whoah. There’s a lot of material in the book, in your paper and in those from your research group. I didn’t realize that one could say so much about flatness! It’s very likely I have misunderstood, but are you guys talking about why a model seems to end up on a particular part of a (high dimensional) ridge/plateau of the loss function? The relationship between parameter perturbations and data perturbations is interesting. Do you think robustness to parameter perturbations is acting as a proxy for robustness to data perturbations, which is what we really want? Also, on a more technical note, is the Hironaka Theorem of use when the loss function is effectively piecewise quadratic? Are you concerned that collapsing down so a simple/robust program/function appears to be a one-way process (i.e. it doesn’t look like you could undo it)?
There are too many questions here and there’s no obligation to answer them. I will continue reading around the topic when I have time. Perhaps one day I can write things up for sharing.
Program synthesis is an interesting direction to take these ideas. I hope it pays off. It’s pretty hard to judge. I guess animals need to be robust to parts of their nervous system malfunctioning and people need to be robust to parts of their belief system falling through. Compartmentalisation of the programs/concepts would help with this.
Hi Liam, you’ve made a really nice resource here. Thanks. I think you need to put the det in a log (so you can explode it a bit later!). It’s in the equation a little above Examples of Singular Loss Landscapes.