Alice Blair comments on Alignment Faking Demo for Congressional Staffers

Alice Blair 8 Oct 2025 22:20 UTC
1 point
0
I don’t think it needs to be that complicated, it just needs to not show the messy xml tags in the system prompt (since that’s code and is thus scary), emphasize what the model is outputting, what the input was, and whether the system is in eval/no-eval/normal mode. I think there isn’t that much to showing LLMs doing most schemey text stuff, but there are probably some more weird considerations when doing something more agentic like audio or images or whatever, and indeed the other demos with those elements often had more complex UIs.

I would check out CivAI’s stuff if you’re interested in this in more detail, this is more their thing than mine.