Probably not, from the paper: ‘We used LeetCode in Figure 1.5 in the introduction, where GPT-4 passes all stages of mock interviews for major tech companies. Here, to test on fresh questions, we construct a benchmark of 100 LeetCode problems posted after October 8th, 2022, which is after GPT-4’s pretraining period.’
Leetcode questions are not selected for novelty. In fact, the best way to get a problem turned into a Leetcode question is to post it to Leetcode’s discussion board and say someone asked you it in an interview at a big tech company. So it’s still possible that some or even many these questions appear nearly verbatim in the training data.
Might be caused mostly by data leaks (training set contamination).
Probably not, from the paper: ‘We used LeetCode in Figure 1.5 in the introduction, where GPT-4 passes all stages of mock interviews for major tech companies. Here, to test on fresh questions,
we construct a benchmark of 100 LeetCode problems posted after October 8th, 2022, which is after GPT-4’s pretraining period.’
Good point. It’s a bit weird that performance on easy Codeforces questions is so bad (0/10) though.
https://twitter.com/cHHillee/status/1635790330854526981
Leetcode questions are not selected for novelty. In fact, the best way to get a problem turned into a Leetcode question is to post it to Leetcode’s discussion board and say someone asked you it in an interview at a big tech company. So it’s still possible that some or even many these questions appear nearly verbatim in the training data.