RyanCarey comments on Comparing reward learning/​reward tampering formalisms