Qumeric comments on Sparks of Artificial General Intelligence: Early experiments with GPT-4 | Microsoft Research

Qumeric 23 Mar 2023 9:34 UTC
5 points
−5
Might be caused mostly by data leaks (training set contamination).
- Bogdan Ionut Cirstea 23 Mar 2023 11:15 UTC
  15 points
  11
  Parent
  Probably not, from the paper: ‘We used LeetCode in Figure 1.5 in the introduction, where GPT-4 passes all stages of mock interviews for major tech companies. Here, to test on fresh questions,
  we construct a benchmark of 100 LeetCode problems posted after October 8th, 2022, which is after GPT-4’s pretraining period.’
  - Qumeric 23 Mar 2023 13:05 UTC
    15 points
    6
    Parent
    Good point. It’s a bit weird that performance on easy Codeforces questions is so bad (0/10) though.
    https://twitter.com/cHHillee/status/1635790330854526981
  - Abe 24 Mar 2023 14:10 UTC
    2 points
    1
    Parent
    Leetcode questions are not selected for novelty. In fact, the best way to get a problem turned into a Leetcode question is to post it to Leetcode’s discussion board and say someone asked you it in an interview at a big tech company. So it’s still possible that some or even many these questions appear nearly verbatim in the training data.