Our prompt is fixed in all experiments and quite detailed—you can see the schema in appendix D. We ask the model to give a JSON object consisting of the objectives—implicit and explicit constraints, instructions etc that the answer should have satisfied in context—an analysis of compliance with them, as well as surfacing any uncertainties.
I’d expect that if we ran the same experiment as in Section 4 but without training for confessions then confession accuracy will be flat (and not growing as it was in the case where we did train for it). We will consider doing it though can’t promise that we will since it is cumbersome for some annoying technical reasons.
I think your timelines were too aggressive but I wouldn’t worry about the title too much. If by the end of 2027, AI progress is significant enough that no one thinks it’s on track to staying a “normal technology” then I don’t think anyone would hold the 2027 title against you. And if that’s not the case, then titling it AI 2029 wouldn’t have helped.