Good point, this is closely related to Oliver Daniels’ comment below. I replied there with some details about the control strategies we’re exploring (neutral constitution, non-constitutional LoRA fine-tuning, etc.)
Good point, this is closely related to Oliver Daniels’ comment below. I replied there with some details about the control strategies we’re exploring (neutral constitution, non-constitutional LoRA fine-tuning, etc.)