programjames comments on Least-problematic Resource for learning RL?

programjames 12 Aug 2025 0:39 UTC
1 point
0
I learned the “Decision-Estimation Coefficient” under a different name: piKL (or referenced in the more famous paper “Human-Regularized Diplomacy [abbrv.]”). Note that it uses the KL divergence instead of the Hellinger like in that text, the KL is flipped the wrong way (70% confidence), and also is more general since it’s a divergence from any anchor, not just a model from a couple epochs ago, but they’re essentially the same.