Relatedly, a key practicality for making a deal with an AI to reveal its misalignment is that AIs might be unable to provide compelling evidence that they are misaligned which would reduce the value of such a deal substantially (as this evidence isn’t convincing to skeptics).
(We should presumably pay some of the AI admiting it is misaligned and pay more if it can provide compelling evidence of this.)
Relatedly, a key practicality for making a deal with an AI to reveal its misalignment is that AIs might be unable to provide compelling evidence that they are misaligned which would reduce the value of such a deal substantially (as this evidence isn’t convincing to skeptics).
(We should presumably pay some of the AI admiting it is misaligned and pay more if it can provide compelling evidence of this.)