Stuart_Armstrong comments on Reward/​value learning for reinforcement learning