Well, let’s not set the bar too high. E.g. “convinces 90% of a panel of psychologists, cognitive scientists, neuroscientists, and Natural Language Processing researchers in an hour long interrogation”.
Somebody else mentioned Winograd schema testing, which is justified by its targeting of specific weaknesses of current Question Answering / NLP approaches.
Well, let’s not set the bar too high. E.g. “convinces 90% of a panel of psychologists, cognitive scientists, neuroscientists, and Natural Language Processing researchers in an hour long interrogation”.
Somebody else mentioned Winograd schema testing, which is justified by its targeting of specific weaknesses of current Question Answering / NLP approaches.