This seems closely related to the idea of Aligned AI Role-Model Fiction — basically, create the persona you want for an aligned AI and write enough fiction about it to include in your model training data that it becomes a well-defined persona that the base model is aware of, then use that as the target for your alignment methods.
This seems closely related to the idea of Aligned AI Role-Model Fiction — basically, create the persona you want for an aligned AI and write enough fiction about it to include in your model training data that it becomes a well-defined persona that the base model is aware of, then use that as the target for your alignment methods.