The online learning example, though, seems very unambitious in its loss function. Ideally you’d do better than any individual estimator in the underlying data.
I think this is good intuition; I’ll just point out that the example I gave is much simpler than what you can actually ask for. For instance, if I want to be competitive with the best linear combination of at most k of the predictors, then I can do this with klog(n)/epsilon^2 rounds. If I want to be competitive with the best overall combination that uses all n predictors, I can do this with n/epsilon^2 rounds. The guarantees scale pretty gracefully with the problem instance.
(Another thing I’ll point out in passing is that you’re perfectly free to throw in “fraction of correct answers” as an additional predictor, although that doesn’t address the core of your point, though I think that the preceding paragraph does address it.)
Thanks! Glad you enjoyed it.
I think this is good intuition; I’ll just point out that the example I gave is much simpler than what you can actually ask for. For instance, if I want to be competitive with the best linear combination of at most k of the predictors, then I can do this with klog(n)/epsilon^2 rounds. If I want to be competitive with the best overall combination that uses all n predictors, I can do this with n/epsilon^2 rounds. The guarantees scale pretty gracefully with the problem instance.
(Another thing I’ll point out in passing is that you’re perfectly free to throw in “fraction of correct answers” as an additional predictor, although that doesn’t address the core of your point, though I think that the preceding paragraph does address it.)