Scott Garrabrant comments on Announcement: AI alignment prize round 2 winners and next round

Scott Garrabrant 17 Apr 2018 1:41 UTC
15 points
0
I think you want to reward output rather than output that would not have otherwise happened.
This is similar to the fact that if you want to train calibration, you have to optimize you log score and just observe your lack of calibration as an opportunity to increase your log score.