You might be interested in theserelatedresults. TL;DR: people have tried, but at the scale academics are working at, it’s very hard to get RL to learn interesting encoding schemes. Encoded reasoning is also probably not an important part of the performance of reasoning models (see this).
You might be interested in these related results. TL;DR: people have tried, but at the scale academics are working at, it’s very hard to get RL to learn interesting encoding schemes. Encoded reasoning is also probably not an important part of the performance of reasoning models (see this).
Thanks! Your second link is very similar to what I had in mind — I feel a bit embarrassed for missing it.