I’ve heard that GPT-4 base sometimes intuits that it’s non-human, but I haven’t replicated this myself or seen the actual logs. But if true, seems like decent evidence against both of these.
I suspect current models often “cheat” by simply looking at whether there is anything in context or not. Most evals are run with fresh instances, while typical use starts having more casual or irrelevant stuff in context pretty quickly.
I heard that too. Though frequency matters!
I’ve heard that GPT-4 base sometimes intuits that it’s non-human, but I haven’t replicated this myself or seen the actual logs. But if true, seems like decent evidence against both of these.
I suspect current models often “cheat” by simply looking at whether there is anything in context or not. Most evals are run with fresh instances, while typical use starts having more casual or irrelevant stuff in context pretty quickly.
I heard that too. Though frequency matters!