I wonder if you could take the R1-Zero training regime, penalize/restrict using existing words from all languages (maybe only in the scratchpad, not the final response), and obtain a model which can solve math problems by reasoning in a non-existent language.
I wonder if you could take the R1-Zero training regime, penalize/restrict using existing words from all languages (maybe only in the scratchpad, not the final response), and obtain a model which can solve math problems by reasoning in a non-existent language.
It may just use l33tc0d3 or palabres nonexistentes interpolatas d’idioms prossimos.