paulfchristiano comments on Online Learning 2: Bandit learning with catastrophes

paulfchristiano 31 Oct 2016 4:38 UTC
0 points
0
AF
(I meant sampling $x$ repeatedly from the distribution ${^q}_{i}$ , I agree that sampling $x$ at random won’t help identify rare catastrophes.)
- jessicata 1 Nov 2016 4:34 UTC
  LW: 1 AF: 1
  0
  AF Parent
  The main qualitative difference from sampling from ${^q}_{i}$ is that we’re targeting a specific tradeoff between catastrophes and reward, rather than zero probability of catastrophe. I agree that when $τ = 0$ we’re just sampling from ${^q}_{i}$ .