rohinmshah comments on [AN #106]: Evaluating generalization ability of learned reward models