Richard_Ngo comments on Richard Ngo’s Shortform

Richard_Ngo 6 Feb 2026 8:35 UTC
3 points
0
That’s incorrect, because it’s also possible for an AI to falsely confess to scheming. This also happens to humans, e.g. if you keep asking criminal suspects loaded questions. And so there may never actually be a phase transition, because a “schemer” that tells the truth 1% of the time may not be distinguishable from an honest AI that falsely confesses 1% of the time.