Interestingly, looking back at ARC-AGI while writing the review, I notice that Gemini-3 Deep Think has quietly crossed the winning threshold (85%) on the original ARC-AGI without that getting much notice. Of course it’s a big closed-source model, so not eligible for the grand prize. But it’s fascinating to compare to Chollet & Knoop’s original prediction that LLM-based approaches would stall out far below that:
Interestingly, looking back at ARC-AGI while writing the review, I notice that Gemini-3 Deep Think has quietly crossed the winning threshold (85%) on the original ARC-AGI without that getting much notice. Of course it’s a big closed-source model, so not eligible for the grand prize. But it’s fascinating to compare to Chollet & Knoop’s original prediction that LLM-based approaches would stall out far below that: