Ah, so the idea is that task preferences are encoded in the activations of a description of a task, even when the model isn’t generating text relating to any choices regarding that task?
Yes, and the tasks are all prompts that actually ask the model to complete them. But indeed independent of any choice framing.
Ah, so the idea is that task preferences are encoded in the activations of a description of a task, even when the model isn’t generating text relating to any choices regarding that task?
Yes, and the tasks are all prompts that actually ask the model to complete them. But indeed independent of any choice framing.