This is a cool proposal! I agree with Caleb below that a potential issue is that the model just outputs its final answer in its CoT, and so it may not require much reasoning. I’m not really sure how to fix this. You could do something like what Caleb proposes, where you just prompt the trained model to output a ruleset that the untrained model gets to see in context when trying to solve the problem (but which is not specific to the individual question, and so can’t give away specific answers). If the model really has no verbalization ability at the start however, this might be hard to get working at the start.
This is a cool proposal! I agree with Caleb below that a potential issue is that the model just outputs its final answer in its CoT, and so it may not require much reasoning. I’m not really sure how to fix this. You could do something like what Caleb proposes, where you just prompt the trained model to output a ruleset that the untrained model gets to see in context when trying to solve the problem (but which is not specific to the individual question, and so can’t give away specific answers). If the model really has no verbalization ability at the start however, this might be hard to get working at the start.