We observe lack of transfer between GPT-4.1, GPT-4.1 mini, and GPT-4.1. nano, which use the same tokenizer.
The other authors my have takes on the specific question you raise. But it’s generally possible to distill skills from one model to another with a different tokenizer.
We observe lack of transfer between GPT-4.1, GPT-4.1 mini, and GPT-4.1. nano, which use the same tokenizer. The other authors my have takes on the specific question you raise. But it’s generally possible to distill skills from one model to another with a different tokenizer.