Nitpick:
The average Mechanical Turker gets a little over 75%, far less than o3’s 87.5%.
Actually, average Mechanical Turk performance is closer to 64% on the ARC-AGI evaluation set. Source: https://arxiv.org/abs/2409.01374.
(Average performance on the training set is around 76%, what this graph seemingly reports.)
So, I think this graph you pull the numbers from is slightly misleading.
Nitpick:
Actually, average Mechanical Turk performance is closer to 64% on the ARC-AGI evaluation set. Source: https://arxiv.org/abs/2409.01374.
(Average performance on the training set is around 76%, what this graph seemingly reports.)
So, I think this graph you pull the numbers from is slightly misleading.