Colin McGlynn comments on How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

Colin McGlynn 3 Oct 2023 16:02 UTC
LW: 2 AF: 1
0
AF
What inspired you to try this approach? It would not occur to me to try this so I am wondering where your intuition came from
- JanB 4 Oct 2023 10:07 UTC
  LW: 1 AF: 1
  0
  AF Parent
  The intuition was that “having lied” (or, having a lie present in the context) should probably change an LLM’s downstream outputs (because, in the training data, liars behave differently than non-liars).
  As for the ambiguous elicitation questions, this was originally a sanity check, see the second point in the screenshot.