That’s incorrect, because it’s also possible for an AI to falsely confess to scheming. This also happens to humans, e.g. if you keep asking criminal suspects loaded questions. And so there may never actually be a phase transition, because a “schemer” that tells the truth 1% of the time may not be distinguishable from an honest AI that falsely confesses 1% of the time.
That’s incorrect, because it’s also possible for an AI to falsely confess to scheming. This also happens to humans, e.g. if you keep asking criminal suspects loaded questions. And so there may never actually be a phase transition, because a “schemer” that tells the truth 1% of the time may not be distinguishable from an honest AI that falsely confesses 1% of the time.