Stephen McAleese comments on Reward IS the Optimization Target