jessicata comments on Online Learning 2: Bandit learning with catastrophes

jessicata 1 Nov 2016 4:34 UTC
LW: 1 AF: 1
0
AF
The main qualitative difference from sampling from ${^q}_{i}$ is that we’re targeting a specific tradeoff between catastrophes and reward, rather than zero probability of catastrophe. I agree that when $τ = 0$ we’re just sampling from ${^q}_{i}$ .