I’d guess the biggest missing control is to fine-tune on some random/neutral diverse data (same compute, same number of steps, same LoRA setup) and then do EM fine-tuning. I have the impression that getting EM can be finnicky and it seems like a simpler explanation that any additional fine-tuning creates inertia against subsequent fine-tuning as opposed to any metacommunicative skills developed by CAI. It would be helpful to see verbatim examples of the model’s generations.
Good point, this is closely related to Oliver Daniels’ comment below. I replied there with some details about the control strategies we’re exploring (neutral constitution, non-constitutional LoRA fine-tuning, etc.)
I’d guess the biggest missing control is to fine-tune on some random/neutral diverse data (same compute, same number of steps, same LoRA setup) and then do EM fine-tuning. I have the impression that getting EM can be finnicky and it seems like a simpler explanation that any additional fine-tuning creates inertia against subsequent fine-tuning as opposed to any metacommunicative skills developed by CAI. It would be helpful to see verbatim examples of the model’s generations.
Good point, this is closely related to Oliver Daniels’ comment below. I replied there with some details about the control strategies we’re exploring (neutral constitution, non-constitutional LoRA fine-tuning, etc.)