Hi, thanks for your comment! You are right to flag this. I think the hypothesis I was testing (character struggles to generalise OOD) is supported by this set of experiments (since both the email-body and agent scaffolding constitute OOD), but the precise cause is not. I ran an additional experiment for this: The model is given the same system prompt with the agentic stuff ablated, and instructed to draft the email (versus send the email before):
It seems that most of the gap comes from agentic scaffolding, and some comes from the email body. This is consistent with both being OOD elements that would reduce character/trait presence.
Hi, thanks for your comment!
You are right to flag this. I think the hypothesis I was testing (character struggles to generalise OOD) is supported by this set of experiments (since both the email-body and agent scaffolding constitute OOD), but the precise cause is not. I ran an additional experiment for this:
The model is given the same system prompt with the agentic stuff ablated, and instructed to draft the email (versus send the email before):
It seems that most of the gap comes from agentic scaffolding, and some comes from the email body. This is consistent with both being OOD elements that would reduce character/trait presence.