Oscar_Cunningham comments on Open & Welcome Thread—June 2022

Oscar_Cunningham 18 Jun 2022 17:55 UTC
2 points
What’s the term for statistical problems that are like exploration-exploitation, but without the exploitation? I tried searching for ‘exploration’ but that wasn’t it.
In particular, suppose I have a bunch of machines which each succeed or fail independently with a probability that is fixed separately for each machine. And suppose I can pick machines to sample to see if they succeed or fail. How do I sample them if I want to become 99% certain that I’ve found the best machine, while using the fewest number of samples?
The difference with exploration-exploitation is that this is just a trial period, and I don’t care about how many successes I get during this testing. So I want something like Thompson sampling, but for my purposes Thompson sampling oversamples the machine it currently thinks is best because it values getting successes rather than ruling out the second-best options.
- MondSemmel 18 Jun 2022 18:21 UTC
  2 points
  Parent
  From Algorithms to Live By, I vaguely recall the multi-armed bandit problem. Maybe that’s what you’re looking for? Or is that still too closely tied to the explore-exploit paradigm?
  - Oscar_Cunningham 30 Jun 2022 14:40 UTC
    4 points
    Parent
    I got a good answer here: https://stats.stackexchange.com/q/579642/5751
  - Oscar_Cunningham 18 Jun 2022 18:36 UTC
    2 points
    Parent
    Or is that still too closely tied to the explore-exploit paradigm?
    Right. The setup for my problem is the same as the ‘bernoulli bandit’, but I only care about the information and not the reward. All I see on that page is about exploration-exploitation.