gjm comments on Rationality Reading Group: Part V: Value Theory

gjm 20 Mar 2016 3:03 UTC
1 point
0
There’s been quite a lot of work on this sort of question, under the title of “Multi-armed bandits”. (As opposed to the “one-armed bandits” you find rows and rows of in casinos.)
- Gram_Stone 20 Mar 2016 3:44 UTC
  0 points
  0
  Parent
  Your response is very different from mine, so I’m wondering if I’m wrong.
  - gjm 20 Mar 2016 15:00 UTC
    2 points
    0
    Parent
    The multi-armed bandit scenario applies when you are uncertain about the distributions produced by these options, and are going to have lots of interactions with them that you can use to discover more about them while extracting utility.
    
    For a one-shot game, or if those estimated utilities are distributions you know each option will continue to produce every time, you just compute the expected utility and you’re done.
    
    But suppose you know that each produces some distribution of utilities, but you don’t know what it is yet (but e.g. maybe you know they’re all normally distributed and have some guess at the means and variances), and you get to interact with them over and over again. Then you will probably begin by trying them all a few times to get a sense of what they do, and as you learn more you will gradually prioritize maximizing expected-utility-this-turn over knowledge gain (and hence expected utility in the future).