Imagine the question was instead inverted to “a physics teacher wants to do an experiment demonstrating the speed of light to children. What would the experiment be?” Now this is straightforward recall because the model can look up what physics teachers do. But in the form that it is, what is the actual lookup that would help solve this question trivially?
The metric is not whether the information to answer this question is available in a model’s corpus, but whether the model can make the connection between the question and the information in its corpus. Cases in which it isn’t straightforward to make that connection are riddles. But that’s also a description of a large class of research breakthroughs – figuring out that X solution from one domain can help answer Y question from another domain. Even though both X and Y are known, connecting them was the trick. That’s the ability I wanted to test.
I tried googling to find the answer. First I tried “melting chocolate in microwave” and “melting chocolate bar in microwave”, but those just brought up recipes. Then I tried “melting chocolate bar in microwave test”, and the experiment came up. So I had to guess it involved testing something, but from there it was easy to solve. (Of course, I might’ve tried other things first if I didn’t know the answer already.)
This is a neat question, but it’s also a pretty straightforward recall test because descriptions of the experiment for teachers are available online.
Imagine the question was instead inverted to “a physics teacher wants to do an experiment demonstrating the speed of light to children. What would the experiment be?” Now this is straightforward recall because the model can look up what physics teachers do. But in the form that it is, what is the actual lookup that would help solve this question trivially?
The metric is not whether the information to answer this question is available in a model’s corpus, but whether the model can make the connection between the question and the information in its corpus. Cases in which it isn’t straightforward to make that connection are riddles. But that’s also a description of a large class of research breakthroughs – figuring out that X solution from one domain can help answer Y question from another domain. Even though both X and Y are known, connecting them was the trick. That’s the ability I wanted to test.
I tried googling to find the answer. First I tried “melting chocolate in microwave” and “melting chocolate bar in microwave”, but those just brought up recipes. Then I tried “melting chocolate bar in microwave test”, and the experiment came up. So I had to guess it involved testing something, but from there it was easy to solve. (Of course, I might’ve tried other things first if I didn’t know the answer already.)