RomanS comments on [linkpost] The final AI benchmark: BIG-bench

RomanS 11 Jun 2022 17:44 UTC
4 points
0
I agree with the sentiment, but would like to be careful with interpreting the average human scores for AI benchmarks. Such scores are obtained under time constrains. And maybe not all human raters were sufficiently motivated to do their best. The ratings for top humans are more likely to be representative of the general human ability to do the task.