I just mean that it’s relatively easy to prove theorems. More precisely, if you decide the probability of a parameter is just determined by the data and model via Bayes’ rule, this is a relatively simple setup compared to e.g. deciding the probability of a parameter is an integral over all possible paths taken by something like SGD from initialisation. From this simplicity we can derive things like Watanabe’s free energy formula, which currently has no analogue for the latter model of the probability of a parameter.
That theorem is far from trivial, but still there seems to be a lot more “surface area” to grip the problem when you think about it first from a Bayesian perspective and then ask what the gap is from there to SGD (even if that’s what you ultimately care about).
I just mean that it’s relatively easy to prove theorems. More precisely, if you decide the probability of a parameter is just determined by the data and model via Bayes’ rule, this is a relatively simple setup compared to e.g. deciding the probability of a parameter is an integral over all possible paths taken by something like SGD from initialisation. From this simplicity we can derive things like Watanabe’s free energy formula, which currently has no analogue for the latter model of the probability of a parameter.
That theorem is far from trivial, but still there seems to be a lot more “surface area” to grip the problem when you think about it first from a Bayesian perspective and then ask what the gap is from there to SGD (even if that’s what you ultimately care about).