JanB comments on How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

JanB 4 Oct 2023 10:07 UTC
LW: 1 AF: 1
0
AF
The intuition was that “having lied” (or, having a lie present in the context) should probably change an LLM’s downstream outputs (because, in the training data, liars behave differently than non-liars).
As for the ambiguous elicitation questions, this was originally a sanity check, see the second point in the screenshot.