If true, would this imply you want a base model to generate lots of solutions and a reasoning model to identify the promising ones and train on those?
If true, would this imply you want a base model to generate lots of solutions and a reasoning model to identify the promising ones and train on those?