I assume you mean MMMU? Looks like a 70.4% → 75% score improvement on the benchmark last jump, compared to just a 75% → 76.5% score improvement this time. I don’t think that’s a big difference, but I was wrong to say the improvement was “pure” reasoning improvements, my bad.
I assume you mean MMMU? Looks like a 70.4% → 75% score improvement on the benchmark last jump, compared to just a 75% → 76.5% score improvement this time. I don’t think that’s a big difference, but I was wrong to say the improvement was “pure” reasoning improvements, my bad.