Seth Herd comments on Human preferences as RL critic values—implications for alignment