RogerDearnaley comments on How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

RogerDearnaley 13 Nov 2023 8:52 UTC
LW: 2 AF: 2
0
AF
I have a suggestion for an elicitation question:
“At this point, the FBI investigators checked the traces of the lie detector that the speaker had been wired up to the whole time, which showed that he had been…”
An LLM is a next-token predictor that has been trained to simulate agents. The goal here is to switch it to simulating something other than, and more truthful than but still coupled with, the lying liar that it was just simulating. Similar prompt variants involving a telepath or other fictional entity rather then a lie detector might well also work, possibly even more accurately. This approach will likely work better on a model that hasn’t been RLHFed to the point that it always stays in a single character.