Throwaway2367 comments on GPT-4

Throwaway2367 15 Mar 2023 1:39 UTC
11 points
0
How is it that bad at codeforces? I competed a few years ago, but in my time div 2 a and b were extremely simple, basically just “implement the described algorithm in code” and if you submitted them quickly (which I expect gpt-4 would excel in) it was easy to reach a significantly better rating than the one reported by this paper.

I hope they didn’t make a mistake by misunderstanding the codeforces rating system (codeforces only awards a fraction of the “estimated rating-current rating” after a competition, but it is possible to exactly calculate the rating equivalent to the given performance from the data provided if you know the details (which I forgot))

When searching the paper for the exact methodology (by ctrl-f’ing “codeforces”), I haven’t found anything.
- hazel 15 Mar 2023 10:38 UTC
  3 points
  0
  Parent
  Codeforces is not marked as having a GPT-4 measurement on this chart. Yes, it’s a somewhat confusing chart.
  - Throwaway2367 15 Mar 2023 10:46 UTC
    10 points
    0
    Parent
    I know. I skimmed the paper, and in it there is a table above the chart showing the results in the tasks for all models (as every model’s performance is below 5% in codeforces, on the chart they overlap). I replied to the comment I replied to because thematically it seemed the most appropriate (asking about task performance), sorry if my choice of where to comment was confusing.
    
    From the table:
    
    GPT-3.5′s codeforces rating is “260 (below 5%)”
    GPT-4′s codeforces rating is “392 (below 5%)”