For what it’s worth, as a crude heuristic, I did a little Python programming to compute the brotli ratio of all conversations.json files. Of the conversations whose compression ratios were above 14x, Trinity Large was number one (at 57x!), followed by all the other olmo models, plus Google Gemini 2.5, Qwen3, and Llama 3.3. I think fits pretty well with the qualitative picture in the post above (both the arcee models and the olmo models degenerated into verbatim repetition a lot more than the other models did).
For what it’s worth, as a crude heuristic, I did a little Python programming to compute the brotli ratio of all
conversations.jsonfiles. Of the conversations whose compression ratios were above 14x, Trinity Large was number one (at 57x!), followed by all the other olmo models, plus Google Gemini 2.5, Qwen3, and Llama 3.3. I think fits pretty well with the qualitative picture in the post above (both the arcee models and the olmo models degenerated into verbatim repetition a lot more than the other models did).raw data for the curious
nice! didn’t think to run this generally it seems like models that have less post training devolve much more into repetition, which makes some sense.