toomy comments on Generating the Funniest Joke with RL (according to GPT-4.1)

toomy 17 May 2025 16:33 UTC
1 point
0
Very cool experiment!

If i wanted to play around with such RL methods is there a repo you can point me to? Or even better is your code available somewhere? Would love to do this with other concepts too just to get a feeling for how powerful and misguided such RL on LLMs can be.
- agg 18 May 2025 7:13 UTC
  1 point
  0
  Parent
  I ran this on runrl.com with the llm-as-judge option and the default settings for everything else (disclaimer: I work for runrl.com and thus have a lot of free credits to experiment with)
  - toomy 18 May 2025 19:49 UTC
    1 point
    0
    Parent
    I see. I assume it would cost me too much to just play around for fun and experimentation.
    - agg 18 May 2025 20:13 UTC
      1 point
      0
      Parent
      I think each of these runs was ~$40 (half an hour at $80 per 8xH100 node-hour)