Halpern and Leung propose the “minimax weighted expected regret” (MWER) decision-rule, which is a generalization of the minimax-expected-regret (MER) decision-rule. In contrast, our decision rule is a weighted generalization of maximin-expected-utility (MMEU). The problem with MER is that it doesn’t work very well with learning. The closest thing to doing learning with MER is adversarial bandits. However, adversarial regret is statistically intractable for Markov Decision Processes. And even with bandits there is a hidden obliviousness assumption if you try to interpret it in a principled decision-theoretic way.
Halpern and Leung propose the “minimax weighted expected regret” (MWER) decision-rule, which is a generalization of the minimax-expected-regret (MER) decision-rule. In contrast, our decision rule is a weighted generalization of maximin-expected-utility (MMEU). The problem with MER is that it doesn’t work very well with learning. The closest thing to doing learning with MER is adversarial bandits. However, adversarial regret is statistically intractable for Markov Decision Processes. And even with bandits there is a hidden obliviousness assumption if you try to interpret it in a principled decision-theoretic way.