AlexMennen comments on An environment for studying counterfactuals

AlexMennen 11 Jul 2018 4:55 UTC
LW: 5 AF: 2
AF
The multi-armed bandit problem is a many-round problem in which actions in early rounds provide information that is useful for later rounds, so it makes sense to explore to gain this information. That’s different from using exploration in one-shot problems to make the counterfactuals well-defined, which is a hack.