At 159 pages, it’s derived from the lecture notes from the class they teach at MIT. It discusses the “Decision-Estimation Coefficient” which checks notes@Diffractor thinks is the bee’s knees. Abstract;
These lecture notes give a statistical perspective on the foundations of reinforcement learning and interactive decision making. We present a unifying framework for addressing the exploration-exploitation dilemma using frequentist and Bayesian approaches, with connections and parallels between supervised learning/estimation and decision making as an overarching theme. Special attention is paid to function approximation and flexible model classes such as neural networks. Topics covered include multi-armed and contextual bandits, structured bandits, and reinforcement learning with high-dimensional feedback.
I learned the “Decision-Estimation Coefficient” under a different name: piKL (or referenced in the more famous paper “Human-Regularized Diplomacy [abbrv.]”). Note that it uses the KL divergence instead of the Hellinger like in that text, the KL is flipped the wrong way (70% confidence), and also is more general since it’s a divergence from any anchor, not just a model from a couple epochs ago, but they’re essentially the same.
I recently asked Dalcy if he had gotten any better answer to this question, and he said yes! This;
Foundations of Reinforcement Learning and Interactive Decision Making, by Foster and Rakhlin
At 159 pages, it’s derived from the lecture notes from the class they teach at MIT. It discusses the “Decision-Estimation Coefficient” which checks notes @Diffractor thinks is the bee’s knees. Abstract;
I learned the “Decision-Estimation Coefficient” under a different name: piKL (or referenced in the more famous paper “Human-Regularized Diplomacy [abbrv.]”). Note that it uses the KL divergence instead of the Hellinger like in that text, the KL is flipped the wrong way (70% confidence), and also is more general since it’s a divergence from any anchor, not just a model from a couple epochs ago, but they’re essentially the same.