Buck comments on Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy

Buck 26 Jul 2023 17:54 UTC
LW: 5 AF: 1
3
AF
I’d say the main point here is that I don’t want to rely on my ability to extrapolate anything about how the model behaves in “unseen situations”, I want to run this eval in every situation where I’m deploying my model.