Bird Concept comments on How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

Bird Concept 29 Sep 2023 18:20 UTC
LW: 5 AF: 2
0
AF
So, when a human lies over the course of an interaction, they’d be holding a hidden state in mind throughout. However, an LLM wouldn’t carry any cognitive latent state over between telling the lie, and then responding to the elicitation question. I guess it feels more like “I just woke up from amnesia, and seems I have just told a lie. Okay, now what do I do...”
Stating this to:
1. Verify that indeed this is how the paper works, and there’s no particular way of passing latent state that I missed, and
2. Any thoughts on how this affects the results and approach?
- JanB 29 Sep 2023 19:29 UTC
  LW: 1 AF: 1
  0
  AF Parent
  Verify that indeed this is how the paper works, and there’s no particular way of passing latent state that I missed, and
  Yes, this is how the paper works.
  Any thoughts on how this affects the results and approach?
  Not really. I find the simulator framing is useful to think about this.