At @ryan_greenblatt’s prodding, I’ve added the following paragraph:
To be fully clear: deploying known scheming models is likely to be highly risky and should only be done in truly desperate circumstances. Our current best guess for the bare minimum policy that a reasonable AI company would follow is as follows: Upon first finding evidence that one of their production models is scheming (i.e., actions as egregious as seriously trying to escape or sabotage research in normal usage), an AI company should publicly disclose this and halt virtually all usage of all models of a similar level of capability or greater. The model (and any other model at a similar or greater level of capability) should only be (re-)deployed (internally or otherwise) if an independent and non-conflicted third-party group with expertise in AI safety believes that deploying the model is the best approach for the company to reduce risk. (If there is a government-appointed group which could serve this function, this group should be used.) This still applies if the company believes they have determined why scheming arose and resolved the issue in the new model they wish to deploy. The decision to deploy the model, the rationale for this, and the membership of this third-party group should also be disclosed prior to the deployment actually occurring. My guess is that if AI companies deploy known scheming models, they won’t actually be as responsible as I’m describing in this paragraph.
At @ryan_greenblatt’s prodding, I’ve added the following paragraph: