Hi Mike: Training for multiple epochs on the datapoints should totally be fair game, although I wouldn’t have guessed it to be necessary. I’ll try that.
Does the error for SGD assume single-pass
I’m not quite sure how to apply the SGD bound here. The SGD bound is only about optimizing convex functions. The function we care about is:
which is not convex.
(. = dot product, delta = 1 if args are the same, 0 else).
No one has done this. I think someone is working on it.