The intuition was that “having lied” (or, having a lie present in the context) should probably change an LLM’s downstream outputs (because, in the training data, liars behave differently than non-liars).
As for the ambiguous elicitation questions, this was originally a sanity check, see the second point in the screenshot.
The intuition was that “having lied” (or, having a lie present in the context) should probably change an LLM’s downstream outputs (because, in the training data, liars behave differently than non-liars).
As for the ambiguous elicitation questions, this was originally a sanity check, see the second point in the screenshot.