It would be more interesting to me to see this comparison repeated for models with a higher level of training data memorization / leakage like Mixtral or Llama, per this paper’s results; https://arxiv.org/abs/2502.12896v1
Similarly, looking at summarization performed by a model from a different vendor would arguably provide a stronger test for OpenAI models.
It would be more interesting to me to see this comparison repeated for models with a higher level of training data memorization / leakage like Mixtral or Llama, per this paper’s results; https://arxiv.org/abs/2502.12896v1
Similarly, looking at summarization performed by a model from a different vendor would arguably provide a stronger test for OpenAI models.
Ooh, this sounds like a neat follow-up – thank you for sharing!