cousin_it comments on An environment for studying counterfactuals

cousin_it 11 Jul 2018 0:44 UTC
LW: 6 AF: 2
0
AF
I’m pretty sure exploration is a hack. Trembling hand shouldn’t be required for good decisions. The right decision theory should make do with the natural amount of uncertainty: “I’m not sure what I’ll do because I haven’t finished thinking and could still stumble on a good argument for any of the options.” That’s the kind of thing I’d want to see formalized.
- Diffractor 11 Jul 2018 3:48 UTC
  LW: 2 AF: 1
  0
  AF Parent
  If exploration is a hack, then why do pretty much all multi-armed bandit algorithms rely on exploration into suboptimal outcomes to prevent spurious underestimates of the value associated with a lever?
  - AlexMennen 11 Jul 2018 4:55 UTC
    LW: 5 AF: 2
    0
    AF Parent
    The multi-armed bandit problem is a many-round problem in which actions in early rounds provide information that is useful for later rounds, so it makes sense to explore to gain this information. That’s different from using exploration in one-shot problems to make the counterfactuals well-defined, which is a hack.
- Nisan 11 Jul 2018 2:30 UTC
  LW: 2 AF: 1
  0
  AF Parent
  I agree exploration is a hack. I think exploration vs. other sources of non-dogmatism is orthogonal to the question of counterfactuals, so I’m happy to rely on exploration for now.