Okay, this is a weak example of alignment generalization failure! To check:
We gave a task relatively far out of distribution (because model almost definetely haven’t encountered this particular complex combination of tasks)
Model successfully does task.
But alignment is totally botched.
Okay, this is a weak example of alignment generalization failure! To check:
We gave a task relatively far out of distribution (because model almost definetely haven’t encountered this particular complex combination of tasks)
Model successfully does task.
But alignment is totally botched.