RyanCarey comments on Comparing reward learning/reward tampering formalisms

RyanCarey 10 Jun 2020 16:54 UTC
LW: 7 AF: 3
AF
It would be nice to draw out this distinction in more detail. One guess:
- Uninfluencability seems similar to requiring zero individual treatment effect of D on R.
- Riggability (from the paper) would then correspond to zero average treatment effect of D on R