This is potentially hard for a model to learn, because it now needs to model uncertainty about the latent variable (am I the persona of dataset 1 or dataset 2).
I think modelling a great many different personas and keeping them all straight is a core ability / capability spike of an LLM. Base models (the model itself, not the personas it simulates) are far, far better at it than any human actor. So I would expect it to model dataset 1 and dataset 2 as two different personas, and be able to switch between them easily. Which is probably not the behavior the people applying the training to it were intending.
I think modelling a great many different personas and keeping them all straight is a core ability / capability spike of an LLM. Base models (the model itself, not the personas it simulates) are far, far better at it than any human actor. So I would expect it to model dataset 1 and dataset 2 as two different personas, and be able to switch between them easily. Which is probably not the behavior the people applying the training to it were intending.