Lacking access to the other’s hardware, I think you’d need something that’s easy to compute for an honest AI, but hard to compute for a deceptive AI. Because a deceptive AI could always just simulate an honest AI, how do you distinguish simulation?
The only way I can think of is resource constraints. Deception adds a slight overhead to calculating quantities that depends in detail on the state of the AI. If you know what computational capabilities the other AI has very precisely, and you can time your communications with it, then maybe it can compute something for you that you can later verify implies honesty.
Lacking access to the other’s hardware, I think you’d need something that’s easy to compute for an honest AI, but hard to compute for a deceptive AI. Because a deceptive AI could always just simulate an honest AI, how do you distinguish simulation?
The only way I can think of is resource constraints. Deception adds a slight overhead to calculating quantities that depends in detail on the state of the AI. If you know what computational capabilities the other AI has very precisely, and you can time your communications with it, then maybe it can compute something for you that you can later verify implies honesty.
That’s an interesting idea!