Why can’t the mode-collapse just be from convergent evolution in terms of what the lowest-common denominator rater will find funny? If there are only a few top candidates, then you’d expect a lot of overlap. And then there’s the very incestuous nature of LLM training these days: everyone is distilling and using LLM judges and publishing the same datasets to Hugging Face and training on them. That’s why you’ll ask Grok or Llama or DeepSeek-R1 a question and hear “As an AI model trained by OpenAI...”.
Why can’t the mode-collapse just be from convergent evolution in terms of what the lowest-common denominator rater will find funny? If there are only a few top candidates, then you’d expect a lot of overlap. And then there’s the very incestuous nature of LLM training these days: everyone is distilling and using LLM judges and publishing the same datasets to Hugging Face and training on them. That’s why you’ll ask Grok or Llama or DeepSeek-R1 a question and hear “As an AI model trained by OpenAI...”.