Curated. I thought this was a pretty interesting result. I’m not sure if I should have been surprised by it, but I was. They also point to a decent amount of interesting follow-up work, though I expect that to generalise less well than “existence proof” papers like this one.
I often enjoy this group’s work finding interesting “info leaks” in LLM behaviour, like their previous lie detector work.
Curated. I thought this was a pretty interesting result. I’m not sure if I should have been surprised by it, but I was. They also point to a decent amount of interesting follow-up work, though I expect that to generalise less well than “existence proof” papers like this one.
I often enjoy this group’s work finding interesting “info leaks” in LLM behaviour, like their previous lie detector work.