What’s the term for statistical problems that are like exploration-exploitation, but without the exploitation? I tried searching for ‘exploration’ but that wasn’t it.
In particular, suppose I have a bunch of machines which each succeed or fail independently with a probability that is fixed separately for each machine. And suppose I can pick machines to sample to see if they succeed or fail. How do I sample them if I want to become 99% certain that I’ve found the best machine, while using the fewest number of samples?
The difference with exploration-exploitation is that this is just a trial period, and I don’t care about how many successes I get during this testing. So I want something like Thompson sampling, but for my purposes Thompson sampling oversamples the machine it currently thinks is best because it values getting successes rather than ruling out the second-best options.
From Algorithms to Live By, I vaguely recall the multi-armed bandit problem. Maybe that’s what you’re looking for? Or is that still too closely tied to the explore-exploit paradigm?
Or is that still too closely tied to the explore-exploit paradigm?
Right. The setup for my problem is the same as the ‘bernoulli bandit’, but I only care about the information and not the reward. All I see on that page is about exploration-exploitation.
What’s the term for statistical problems that are like exploration-exploitation, but without the exploitation? I tried searching for ‘exploration’ but that wasn’t it.
In particular, suppose I have a bunch of machines which each succeed or fail independently with a probability that is fixed separately for each machine. And suppose I can pick machines to sample to see if they succeed or fail. How do I sample them if I want to become 99% certain that I’ve found the best machine, while using the fewest number of samples?
The difference with exploration-exploitation is that this is just a trial period, and I don’t care about how many successes I get during this testing. So I want something like Thompson sampling, but for my purposes Thompson sampling oversamples the machine it currently thinks is best because it values getting successes rather than ruling out the second-best options.
From Algorithms to Live By, I vaguely recall the multi-armed bandit problem. Maybe that’s what you’re looking for? Or is that still too closely tied to the explore-exploit paradigm?
I got a good answer here: https://stats.stackexchange.com/q/579642/5751
Right. The setup for my problem is the same as the ‘bernoulli bandit’, but I only care about the information and not the reward. All I see on that page is about exploration-exploitation.