So, when a human lies over the course of an interaction, they’d be holding a hidden state in mind throughout. However, an LLM wouldn’t carry any cognitive latent state over between telling the lie, and then responding to the elicitation question. I guess it feels more like “I just woke up from amnesia, and seems I have just told a lie. Okay, now what do I do...”
Stating this to:
Verify that indeed this is how the paper works, and there’s no particular way of passing latent state that I missed, and
Any thoughts on how this affects the results and approach?
So, when a human lies over the course of an interaction, they’d be holding a hidden state in mind throughout. However, an LLM wouldn’t carry any cognitive latent state over between telling the lie, and then responding to the elicitation question. I guess it feels more like “I just woke up from amnesia, and seems I have just told a lie. Okay, now what do I do...”
Stating this to:
Verify that indeed this is how the paper works, and there’s no particular way of passing latent state that I missed, and
Any thoughts on how this affects the results and approach?
Yes, this is how the paper works.
Not really. I find the simulator framing is useful to think about this.