I like that we’re having a discussion about what sort of guidelines we could theoretically establish. I don’t this is quite there yet. One problem I see that hasn’t been mentioned in the comments yet is that this seems quite gameable by motivated devs if they did get to the point where they could almost but not quite pass. You could deliberately tune the model to be more predictable ‘in distribution’, but still not predictable ‘out of distribution’. I think a lot of concerns come from unexpected behaviors arising ‘out of distribution’ when the model is deployed and experiences inputs not anticipated by the test questions.
I like that we’re having a discussion about what sort of guidelines we could theoretically establish. I don’t this is quite there yet. One problem I see that hasn’t been mentioned in the comments yet is that this seems quite gameable by motivated devs if they did get to the point where they could almost but not quite pass. You could deliberately tune the model to be more predictable ‘in distribution’, but still not predictable ‘out of distribution’. I think a lot of concerns come from unexpected behaviors arising ‘out of distribution’ when the model is deployed and experiences inputs not anticipated by the test questions.