Stuart_Armstrong comments on Comparing reward learning/​reward tampering formalisms