This was fun to read! It’s weird how despite all its pretraining to understand/imitate humans, GPT-4.1 seems to be so terrible at understanding humor. I feel like there must be some way to elicit better judgements.
You could try telling GPT-4.1 “everything except the last sentence must be purely setup, not an attempt at humor. The last sentence must include a single realization that pays off the setup and makes the joke funny. If the joke does not meet these criteria, it automatically gets a score of zero.” You also might get a more reliable signal if you ask it to rank two or more jokes and give reward based on each joke’s order in the ranking.
Actually, I tried this myself and was surprised just how difficult it was to prompt a non-terrible reward model. I gave o4-mini the “no humor until the end” requirement and it generated the following joke:
I built a lab in my basement to analyze wheat proteins, wrote code to sequence gluten strands, and even cross-referenced pedigree charts of heirloom grains. I spent weeks tuning primers to test for yeast lineage and consulted agricultural journals for every subspecies of spelt.
Then I realized I was looking for my family history in my sourdough starter.
What does this even mean? It makes no sense to me. Is it supposed to be a pun on “pedigree” and “lineage?” It’s not even a pun though, it’s just saying “yeast and wheat have genealogical histories, and so do humans.”
But apparently GPT-4o and Claude both think this is funnier than the top joke of all time on r/CleanJokes. (Gemini thought the LLM-written joke was only slightly worse.) The joke from Reddit isn’t the most original, but at least it makes sense.
Surely this is something that could be fixed with a little bit of RLHF… there’s no way grading jokes is this difficult.
This was fun to read! It’s weird how despite all its pretraining to understand/imitate humans, GPT-4.1 seems to be so terrible at understanding humor. I feel like there must be some way to elicit better judgements.
You could try telling GPT-4.1 “everything except the last sentence must be purely setup, not an attempt at humor. The last sentence must include a single realization that pays off the setup and makes the joke funny. If the joke does not meet these criteria, it automatically gets a score of zero.” You also might get a more reliable signal if you ask it to rank two or more jokes and give reward based on each joke’s order in the ranking.
Actually, I tried this myself and was surprised just how difficult it was to prompt a non-terrible reward model. I gave o4-mini the “no humor until the end” requirement and it generated the following joke:
What does this even mean? It makes no sense to me. Is it supposed to be a pun on “pedigree” and “lineage?” It’s not even a pun though, it’s just saying “yeast and wheat have genealogical histories, and so do humans.”
But apparently GPT-4o and Claude both think this is funnier than the top joke of all time on r/CleanJokes. (Gemini thought the LLM-written joke was only slightly worse.) The joke from Reddit isn’t the most original, but at least it makes sense.
Surely this is something that could be fixed with a little bit of RLHF… there’s no way grading jokes is this difficult.