gwern comments on Ram Potham’s Shortform

gwern 9 Apr 2025 16:58 UTC
5 points
0
It’s good someone else did it, but it has the same problems as the paper: not updated since May 2024, and limited to open source base models. So it needs to be started back up and add in approximate estimators for the API/chatbot models too before it can start providing a good universal capability benchmark in near-realtime.