Zvi recently shared news about the impressive open model Kimi 2 thinking. It is currently in the top rankings across the leaderboards (you can find several leaderboards via my monitoring page).
I have briefly tried it and probed it with my own hard tests on meta cognition, ontological coherence, and moral epistemics. On these edge cases, where frontier LLMS fail to humans, it is performing equal to Claude 4.5, Gemini 2.5 and GPT 5 and 5 mini. It is also self-reporting limitations more accurately than Claude, and perhaps also better than GPT 4 and 5 mini (not enough testing to say).
Zvi recently shared news about the impressive open model Kimi 2 thinking. It is currently in the top rankings across the leaderboards (you can find several leaderboards via my monitoring page).
I have briefly tried it and probed it with my own hard tests on meta cognition, ontological coherence, and moral epistemics. On these edge cases, where frontier LLMS fail to humans, it is performing equal to Claude 4.5, Gemini 2.5 and GPT 5 and 5 mini. It is also self-reporting limitations more accurately than Claude, and perhaps also better than GPT 4 and 5 mini (not enough testing to say).