Does the 1/sqrt(N) error for SGD assume single-pass? It seems like if we’re bottlenecked on few data points we can use multi-pass and do nearly as well as bayesian (at least for half-spaces).
Hi Mike:
Training for multiple epochs on the datapoints should totally be fair game, although I wouldn’t have guessed it to be necessary. I’ll try that.
Does the 1/√N error for SGD assume single-pass
I’m not quite sure how to apply the SGD bound here. The SGD bound is only about optimizing convex functions. The function we care about is:
L(v)=Ex∼N(0,I)δ(sign(x.v),sign(x.w))
which is not convex.
(. = dot product, delta = 1 if args are the same, 0 else).
Does the 1/sqrt(N) error for SGD assume single-pass? It seems like if we’re bottlenecked on few data points we can use multi-pass and do nearly as well as bayesian (at least for half-spaces).
Hi Mike: Training for multiple epochs on the datapoints should totally be fair game, although I wouldn’t have guessed it to be necessary. I’ll try that.
I’m not quite sure how to apply the SGD bound here. The SGD bound is only about optimizing convex functions. The function we care about is:
L(v)=Ex∼N(0,I)δ(sign(x.v),sign(x.w))
which is not convex.
(. = dot product, delta = 1 if args are the same, 0 else).