Anti-fitting generalized reasoning test for o3h/o4 mh https://llm-benchmark.github.io/ https://www.lesswrong.com/posts/CEHsJzBCmuhEDdNxg/debunk-the-myth-testing-the-generalized-reasoning-ability-of
Disappointing, I thought it would be much better than GROK, it seems that this version cannot be the one shown by ARC AGI in mid-December.
click the to expand all questions and answers for all models