ben_york

Karma: 6

ben_york 10 Feb 2026 17:20 UTC
1 point
0
on: DSLT 1. The RLCT Measures the Effective Dimension of Neural Networks
detI(w(0))
Hi Liam, you’ve made a really nice resource here. Thanks. I think you need to put the det in a log (so you can explode it a bit later!). It’s in the equation a little above Examples of Singular Loss Landscapes.

ben_york 27 Jan 2026 9:57 UTC
2 points
0
in reply to: Zach Furman’s comment on: Deep learning as program synthesis
Thanks for your reply. It’s a fascinating topic and I’ve got lots of follow-up questions but I’ll read the paper and book first to get a better idea of which questions have already been addressed.
(edit 2 days later): Whoah. There’s a lot of material in the book, in your paper and in those from your research group. I didn’t realize that one could say so much about flatness! It’s very likely I have misunderstood, but are you guys talking about why a model seems to end up on a particular part of a (high dimensional) ridge/plateau of the loss function? The relationship between parameter perturbations and data perturbations is interesting. Do you think robustness to parameter perturbations is acting as a proxy for robustness to data perturbations, which is what we really want? Also, on a more technical note, is the Hironaka Theorem of use when the loss function is effectively piecewise quadratic? Are you concerned that collapsing down so a simple/robust program/function appears to be a one-way process (i.e. it doesn’t look like you could undo it)?
There are too many questions here and there’s no obligation to answer them. I will continue reading around the topic when I have time. Perhaps one day I can write things up for sharing.
Program synthesis is an interesting direction to take these ideas. I hope it pays off. It’s pretty hard to judge. I guess animals need to be robust to parts of their nervous system malfunctioning and people need to be robust to parts of their belief system falling through. Compartmentalisation of the programs/concepts would help with this.

ben_york 26 Jan 2026 18:12 UTC
6 points
5
on: Deep learning as program synthesis
Hi Zach. Thanks for such a nice post. The degeneracies seem crucial to the apparent simplicity bias. Does footnote 15 imply that somehow the parameter vector works its way to a certain part of the parameter space, where it gets stuck because the loss function gradients can’t steer it out? Also, does this interpretation mean that simplicity is related to (or even more accurately described as) robustness, which would make intuitive sense to me. In this case different measures of simplicity could be reframed as measures of robustness to different types of perturbation.