Orpheus16 comments on Reframing the burden of proof: Companies should prove that models are safe (rather than expecting auditors to prove that models are dangerous)

Orpheus16 27 Apr 2023 20:11 UTC
2 points
0
Nice—very relevant. I agree with Evan that arguments about the training procedure will be relevant (I’m more uncertain about whether checking for deception behaviorally will be harder than avoiding it, but it certainly seems plausible).
Ideally, I think the regulators would be flexible in the kind of evidence they accept. If a developer has evidence that the model is not deceptive that relies on details about the training procedure, rather than behavioral testing, that could be sufficient.
(In fact, I think arguments that meet some sort of “beyond-a-reasonable-doubt” threshold would likely involve providing arguments for why the training procedure avoids deceptive alignment.)