I agree that training on more LLMs would be good. However, I would like to note that gpt-oss-20b, one of the models we train on is a sparse, MoE reasoning model, released half a year ago.
My apologies, I stand corrected. Then your model choice makes a lot more sense than I thought. And I am now more surprised by your results.
I agree that training on more LLMs would be good. However, I would like to note that gpt-oss-20b, one of the models we train on is a sparse, MoE reasoning model, released half a year ago.
My apologies, I stand corrected. Then your model choice makes a lot more sense than I thought. And I am now more surprised by your results.