I’ve been particularly impressed by 3.1 Pro’s ability to do math problems. I have 3 problems that I like to pose to AIs in increasing levels of difficulty (all requiring or greatly aided by a postgraduate-level knowledge of mathematics).
Gemini 3.1 Pro and Opus 4.6 are the first models that could solve the first one, or even come close to a correct solution. Opus was unnecessarily verbose and appealed to some advanced mathematics jargon, while Gemini gave a much simpler, far more readable solution.
The second problem was eventually solved by Opus after a couple of false claims and some strong hints (and the final solution still had some inaccuracies), but Gemini just breezed through it and gave a solution that was both more general and more elegant than the one that I came up with. The problem and a solution sketch could be in the training data as a singular Reddit comment, but that didn’t seem to help Opus and Gemini’s solution appears to be novel.
The third problem takes a long time to solve and requires several indirect steps—I suspect asking an LLM to one-shot the solution is simply the wrong format and that a more Ralph-loop style approach might be appropriate. Opus was hopeless and I couldn’t even get it to reason well about the problem even with direct hints. Gemini, however, got the first important insight then got lost from there, but some strong hinting about what to look for eventually led it to a correct solution.
I’ve noticed that while Opus’ mathematical output is often vague, filled with jargon, and difficult to understand, 3.1 Pro is much easier to read and it seems to prefer to directly use elementary techniques rather than appealing to advanced theorems. Even when it is wrong, it is quite easy to see where the specific incorrect step is. It makes the output much more potentially useful overall. I could see it legitimately helping with advanced mathematical work.
Party like it’s 1997
Relive the glory days of the internet when web design was at its best.