Maybe we won’t actually use “a Bayesian agent with prior P and decision rule D” (or whatever) as our AGI algorithm, because it’s not computationally optimal. But even so, we can still reason about what this algorithm would do. And whatever it would do, we can call that a “benchmark”! Then we can (1) prove theorems about this benchmark, and (2) prove theorems about how a different, non-Bayesian algorithm performs relative to that benchmark. (Hope I got that right!)
My claim is somewhat stronger than that. I claim that any reasonable RL algorithm must have Bayesian regret vanishing in the γ→1 limit relatively to some natural prior, otherwise it’s not worth of the description “reasonable RL algorithm”. Moreover, deep RL algorithms also satisfy such bounds, we just don’t know how to describe the right prior and the convergence rate yet (and we already have some leads, for example Allen-Zhu et al show that three-layer deep offline learning captures a hypothesis class that contains everything expressible by a similar network with sufficiently smooth activations).
My claim is somewhat stronger than that. I claim that any reasonable RL algorithm must have Bayesian regret vanishing in the γ→1 limit relatively to some natural prior, otherwise it’s not worth of the description “reasonable RL algorithm”. Moreover, deep RL algorithms also satisfy such bounds, we just don’t know how to describe the right prior and the convergence rate yet (and we already have some leads, for example Allen-Zhu et al show that three-layer deep offline learning captures a hypothesis class that contains everything expressible by a similar network with sufficiently smooth activations).