Seth Herd comments on “Behaviorist” RL reward functions lead to scheming