I disagree with framing these results in terms of “dishonesty” or “intentional deception”.
Or, at least, it’s severely under-argued that this framing is more accurate than “more capable models producing more accurate statements by default and also more capable of taking on any role than you imply in the prompt”.
I disagree with framing these results in terms of “dishonesty” or “intentional deception”.
Or, at least, it’s severely under-argued that this framing is more accurate than “more capable models producing more accurate statements by default and also more capable of taking on any role than you imply in the prompt”.