Very cool experiment!
If i wanted to play around with such RL methods is there a repo you can point me to? Or even better is your code available somewhere? Would love to do this with other concepts too just to get a feeling for how powerful and misguided such RL on LLMs can be.
toomy
Karma: 0
I see. I assume it would cost me too much to just play around for fun and experimentation.