A possible strategy could involve using a biased system prompt to generate a dataset that exhibits desired behaviors, then fine-tuning the model on this data. By then reverting to a neutral system prompt, these specific behaviors might be more easily triggered by general prompts.
This post does that. See Honest → Neutral and Don’t Exploit → Neutral:
This post does that. See Honest → Neutral and Don’t Exploit → Neutral: