The main qualitative difference from sampling from ^qi is that we’re targeting a specific tradeoff between catastrophes and reward, rather than zero probability of catastrophe. I agree that when τ=0 we’re just sampling from ^qi.
The main qualitative difference from sampling from ^qi is that we’re targeting a specific tradeoff between catastrophes and reward, rather than zero probability of catastrophe. I agree that when τ=0 we’re just sampling from ^qi.