Some additional evidence: o3 used 5.7B tokens per task to achieve its ARC score of 87.5%; it also scored 75.7% on low compute mode using 33M tokens per task:
https://arcprize.org/blog/oai-o3-pub-breakthrough
Some additional evidence: o3 used 5.7B tokens per task to achieve its ARC score of 87.5%; it also scored 75.7% on low compute mode using 33M tokens per task:
https://arcprize.org/blog/oai-o3-pub-breakthrough