[Question] Parameter count of ML systems through time?

Pablo Villalobos and I have been working to compile a rough dataset of parameter counts for some notable ML systems through history.

This is hardly the most important metric about the systems (other interesting metrics we would like to understand better are training and inference compute , and dataset size), but it is nonetheless an important one and particularly easy to estimate.

So far we have compiled what it is (to our knowledge) the biggest dataset so far of parameter counts, with over a 100 entries.


But we could use some help to advance the project:

  1. Is there any previous relevant work? We are aware of the AI and compute post by OpenAI, and there are some papers with some small tables of parameter counts.

  2. If you want to contribute with an entry, please do! The key information for an entry is a reference (citation and link), domain (language, vision, games, etc), main task the system was designed to solve, parameter count (explained with references so its easy to double check), and date of publication. The criteria for inclusion is not very well defined at this stage in the process; we have been focusing on notable papers (>1000 citations), significant SOTA improvements (>10% improvement on a metric over previous system) and historical relevance (subjective). We mostly have ML/​DL/​RL papers, and some statistical learning papers. To submit an entry either leave an answer here, send me a PM, email jaimesevillamolina at gmail dot com or leave a comment in the spreadsheet.

  3. If you’d be interested in joining the project, shoot me an email. The main commitment is to spend 1h per week curating dataset entries. Our current goal is compiling parameter counts of one system per year between 2000 and 2020 and per main domain. If you can compute the number of parameters of a CNN from its architecture you are qualified. I expect participating will be most useful to people who would enjoy having an excuse to skim through old AI papers.

Thank you to Girish Sastry and Max Daniel for help and discussion so far!