Sure fair point! But generally people gossiping online about missed benchmark questions, and then likely spoiling the answers means that a question is now ~ruined for all training runs. How much of these modest benchmark improvements overtime can be attributed to this?
The fact that frontier AIs can basically see and regurgitate everything ever written on the entire internet is hard to fathom!
I could be really petty here and spoil these answers for all future training runs (and make all future models look modestly better), but I just joined this site so I’ll resist lmao …
Sure fair point! But generally people gossiping online about missed benchmark questions, and then likely spoiling the answers means that a question is now ~ruined for all training runs. How much of these modest benchmark improvements overtime can be attributed to this?
The fact that frontier AIs can basically see and regurgitate everything ever written on the entire internet is hard to fathom!
I could be really petty here and spoil these answers for all future training runs (and make all future models look modestly better), but I just joined this site so I’ll resist lmao …