Consider these possibilities for what benchmarks are doing here.
Training AI’s on quantum physics questions directly makes the AI smarter.
Training on quantum physics makes the AI memorize quantum trivia, but doesn’t make it smarter in some deep sense.
Like 2, except that the existence of the benchmark makes grad student descent more effective. People learn which AI algorithms work best via human trial and error.
Like 3, except that public benchmarks just get memorized. Making them useless for grad student descent.
Consider these possibilities for what benchmarks are doing here.
Training AI’s on quantum physics questions directly makes the AI smarter.
Training on quantum physics makes the AI memorize quantum trivia, but doesn’t make it smarter in some deep sense.
Like 2, except that the existence of the benchmark makes grad student descent more effective. People learn which AI algorithms work best via human trial and error.
Like 3, except that public benchmarks just get memorized. Making them useless for grad student descent.