Steven Byrnes comments on Why almost every RL agent does learned optimization