“Anonymous” and 540B parameters, hmm… I’m sure it’s not from the company named after an even larger number.
GSM8K = grade school math word problems
DROP = reading comprehension benchmark requiring discrete reasoning over paragraphs
OpenBookQA = question-answering dataset modeled after open book exams for assessing human understanding of a subject. 5,957 multiple-choice elementary-level science questions
ANLI-A3 = adversarial benchmark designed to be challenging to current state-of-the-art models
“Anonymous” and 540B parameters, hmm… I’m sure it’s not from the company named after an even larger number.
GSM8K = grade school math word problems
DROP = reading comprehension benchmark requiring discrete reasoning over paragraphs
OpenBookQA = question-answering dataset modeled after open book exams for assessing human understanding of a subject. 5,957 multiple-choice elementary-level science questions
ANLI-A3 = adversarial benchmark designed to be challenging to current state-of-the-art models