For what it’s worth, we believe that a mechanistic estimator can beat all sampling-based methods, no matter how sophisticated they are. The philosophical reason for this is that sophisticated sampling-based methods outperform simple Monte Carlo by exploiting structure in the function whose average value they’re estimating—but a mechanistic estimator can exploit that same structure, too.
In fact, I think it almost follows from the MSP that we can beat any sampling-based method. To see this, suppose you have some sophisticated estimator Est(Mθ,r), which is given a neural net Mθ and some random coin flips r as input, and produces a sophisticated, unbiased, low-variance estimate of E[Mθ] using r. Now, define the architecture M′ as: M′θ(x)=Est(Mθ,x). The MSP says that we need to be able to estimate the average output of M′θ (which is the same as the average output of Mθ) with squared error less than or equal to the variance of M′θ, in the time that it takes to run M′θ. (We’re taking ε=1 here.) In other words, given any sophisticated sampling algorithm for estimating the average output of Mθ, there needs to be a corresponding mechanistic estimator that gets lower (or equal) error in the same amount of time.
(I think this argument isn’t perfectly tight, because it’ll probably run into the same uniformity issues that I discussed in the “getting rid ofε” appendix, which is why I said “almost follows” rather than “follows”.)
Right, like many people I agree that in most cases derandomization is a good idea if you can afford a little extra code complexity. But the structure of your argument makes it sound like that’s the important part, when actually it’s taking advantage of structure that’s the important part, sampling or not.
Concretely, I would change “Understanding structure helps outperform sampling” to simply “Understanding structure helps”.
Thanks for the suggestion!
For what it’s worth, we believe that a mechanistic estimator can beat all sampling-based methods, no matter how sophisticated they are. The philosophical reason for this is that sophisticated sampling-based methods outperform simple Monte Carlo by exploiting structure in the function whose average value they’re estimating—but a mechanistic estimator can exploit that same structure, too.
In fact, I think it almost follows from the MSP that we can beat any sampling-based method. To see this, suppose you have some sophisticated estimator Est(Mθ,r), which is given a neural net Mθ and some random coin flips r as input, and produces a sophisticated, unbiased, low-variance estimate of E[Mθ] using r. Now, define the architecture M′ as: M′θ(x)=Est(Mθ,x). The MSP says that we need to be able to estimate the average output of M′θ (which is the same as the average output of Mθ) with squared error less than or equal to the variance of M′θ, in the time that it takes to run M′θ. (We’re taking ε=1 here.) In other words, given any sophisticated sampling algorithm for estimating the average output of Mθ, there needs to be a corresponding mechanistic estimator that gets lower (or equal) error in the same amount of time.
(I think this argument isn’t perfectly tight, because it’ll probably run into the same uniformity issues that I discussed in the “getting rid of ε” appendix, which is why I said “almost follows” rather than “follows”.)
Right, like many people I agree that in most cases derandomization is a good idea if you can afford a little extra code complexity. But the structure of your argument makes it sound like that’s the important part, when actually it’s taking advantage of structure that’s the important part, sampling or not.
Concretely, I would change “Understanding structure helps outperform sampling” to simply “Understanding structure helps”.