I’m pretty sure exploration is a hack. Trembling hand shouldn’t be required for good decisions. The right decision theory should make do with the natural amount of uncertainty: “I’m not sure what I’ll do because I haven’t finished thinking and could still stumble on a good argument for any of the options.” That’s the kind of thing I’d want to see formalized.
If exploration is a hack, then why do pretty much all multi-armed bandit algorithms rely on exploration into suboptimal outcomes to prevent spurious underestimates of the value associated with a lever?
The multi-armed bandit problem is a many-round problem in which actions in early rounds provide information that is useful for later rounds, so it makes sense to explore to gain this information. That’s different from using exploration in one-shot problems to make the counterfactuals well-defined, which is a hack.
I agree exploration is a hack. I think exploration vs. other sources of non-dogmatism is orthogonal to the question of counterfactuals, so I’m happy to rely on exploration for now.
I’m pretty sure exploration is a hack. Trembling hand shouldn’t be required for good decisions. The right decision theory should make do with the natural amount of uncertainty: “I’m not sure what I’ll do because I haven’t finished thinking and could still stumble on a good argument for any of the options.” That’s the kind of thing I’d want to see formalized.
If exploration is a hack, then why do pretty much all multi-armed bandit algorithms rely on exploration into suboptimal outcomes to prevent spurious underestimates of the value associated with a lever?
The multi-armed bandit problem is a many-round problem in which actions in early rounds provide information that is useful for later rounds, so it makes sense to explore to gain this information. That’s different from using exploration in one-shot problems to make the counterfactuals well-defined, which is a hack.
I agree exploration is a hack. I think exploration vs. other sources of non-dogmatism is orthogonal to the question of counterfactuals, so I’m happy to rely on exploration for now.