Rohin Shah comments on Learning biases and rewards simultaneously