Am I interpreting you correctly that the responses of both Opus 4 and o3 here are wrong according to the theorem?
Also would the following restatement of the theorem be a correct understanding? The student model can’t ever become worse (according to the teacher) when fine tuned on (any) ouputs from the teacher, on any distribution.
Yes and yes, basically. Although, to be clear: (i) “according to the teacher” should be “according to the loss used to obtain the teacher,” (ii) the theorem deals with the case of directly distilling on logits, whereas our LLM experiments involve sampling according to the teacher’s logits (which introduces noise), and (iii) the theorem only applies when you finetune on the unmodified teacher distribution—it doesn’t deal with the case where you filter the responses.
Am I interpreting you correctly that the responses of both Opus 4 and o3 here are wrong according to the theorem?
Also would the following restatement of the theorem be a correct understanding? The student model can’t ever become worse (according to the teacher) when fine tuned on (any) ouputs from the teacher, on any distribution.
Yes and yes, basically. Although, to be clear: (i) “according to the teacher” should be “according to the loss used to obtain the teacher,” (ii) the theorem deals with the case of directly distilling on logits, whereas our LLM experiments involve sampling according to the teacher’s logits (which introduces noise), and (iii) the theorem only applies when you finetune on the unmodified teacher distribution—it doesn’t deal with the case where you filter the responses.