Has the team tried noise injection as a way of detecting if a model is sandbagging, in these different scenarios?
Has the team tried noise injection as a way of detecting if a model is sandbagging, in these different scenarios?