I find that the Winograd schemas is more useful as a guideline to adversarial queries to stump AIs than an actual test. An AI reaching human-level accuracy on Winograd schemas would be much less impressive to me than an AI passing the traditional Turing test conducted by an expert who is aware of Winograd schemas and experienced in adversarial queries in general. The former is more susceptible to Goodhart’s law due to the stringent format and limited problem space.
I find that the Winograd schemas is more useful as a guideline to adversarial queries to stump AIs than an actual test. An AI reaching human-level accuracy on Winograd schemas would be much less impressive to me than an AI passing the traditional Turing test conducted by an expert who is aware of Winograd schemas and experienced in adversarial queries in general. The former is more susceptible to Goodhart’s law due to the stringent format and limited problem space.