Linked is my MSc thesis, where I do regret analysis for an infra-Bayesian[1] generalization of stochastic linear bandits.
The main significance that I see in this work is:
Expanding our understanding of infra-Bayesian regret bounds, and solidifying our confidence that infra-Bayesianism is a viable approach. Previously, the most interesting IB regret analysis we had was Tian et al which deals (essentially) with episodic infra-MDPs. My work here doesn’t supersede Tian et al because it only talks about bandits (i.e. stateless infra-Bayesian laws), but it complements it because it deals with a parameteric hypothesis space (i.e. fits into the general theme in learning-theory that generalization bounds should scale with the dimension of the hypothesis class).
Discovering some surprising features of infra-Bayesian learning that have no analogues in classical theory. In particular, it turns out that affine credal sets (i.e. such that are closed w.r.t. arbitrary affine combinations of distributions and not just convex combinations) have better learning-theoretic properties, and the regret bound depends on additional parameters that don’t appear in classical theory (the “generalized sine” and the “generalized condition number” ). Credal sets defined using conditional probabilities (related to Armstrong’s “model splinters”) turn out to be well-behaved in terms of these parameters.
In addition to the open questions in the “summary” section, there is also a natural open question of extending these results to non-crisp infradistributions. (I didn’t mention it in the thesis because it requires too much additional context to motivate.)
- ^
I use the word “imprecise” rather than “infra-Bayesian” in the title, because the proposed algorithms achieves a regret bound which is worst-case over the hypothesis class, so it’s not “Bayesian” in any non-trivial sense.
- ^
In particular, I suspect that there’s a flavor ofhomogeneous ultradistributionsfor which the parameterbecomes unnecessary. Specifically, an affine ultradistribution can be thought of as the result of “take an affine subspace of the affine space of signed distributions, intersect it with the space of actual (positive) distributions, then take downwards closure into contributions to make it into a homogeneous ultradistribution”. But we can also consider the alternative “take an affine subspace of the affine space of signed distributions, take downwards closure into signed contributions and then intersect it with the space of actual (positive) contributions”. The order matters!
This work[1] was the first[2] foray into proving non-trivial regret bounds in the robust (infra-Bayesian) setting. The specific bound I got was later slightly improved in Diffractor’s and my later paper. This work studied a variant of linear bandits, due the usual reasons linear models are often studied in learning theory: it is a conveniently simple setting where we actually know how to prove things, even with computationally efficient algorithms. (Although we still don’t have a computationally efficient algorithm for the robust version: not because it’s very difficult, but (probably) just because nobody got around to solving it.) As such, this work was useful as a toy-model test that infra-Bayesianism doesn’t run into statistical intractability issues. As to whether linear-model algorithms or their direct descendants will actually play a role in the ultimate theory of learning, that is still an open question.
An abridged version was also published as a paper in JMLR.
Other than Tian et al, which technically is a robust regret bound, but was not framed by the authors as such (instead, their motivation was studying zero-sum games).