i agree it’s often a good idea to do experiments on small scale before big scale, in order to get tighter feedback loops, but i think mnist in particular is probably a bad task to start with. I think you probably want a dataset that’s somewhat more nontrivial.
mnist is way too easy to solve. like literally solvable with a logistic regression level easy. as a result, mnist models are often not doing any interesting cognition.
a lot of good ideas start working at some minimal scale that is higher than logistic regression. you will discard lots of good ideas because they don’t work on mnist. conversely, lots of things only work on mnist and other very easy datasets like cifar.
i think mnist is mostly useful for, like, sanity checking that a diffusion implementation isn’t fatally broken or something.
it is really not that much more expensive to look at a slightly more modern toy dataset. gpus like big matrices and they hate small matrices. mnist is so small that most of your computational cost is overhead. you can still train models in seconds on slightly harder datasets.
i agree it’s often a good idea to do experiments on small scale before big scale, in order to get tighter feedback loops, but i think mnist in particular is probably a bad task to start with. I think you probably want a dataset that’s somewhat more nontrivial.
mnist is way too easy to solve. like literally solvable with a logistic regression level easy. as a result, mnist models are often not doing any interesting cognition.
a lot of good ideas start working at some minimal scale that is higher than logistic regression. you will discard lots of good ideas because they don’t work on mnist. conversely, lots of things only work on mnist and other very easy datasets like cifar.
i think mnist is mostly useful for, like, sanity checking that a diffusion implementation isn’t fatally broken or something.
it is really not that much more expensive to look at a slightly more modern toy dataset. gpus like big matrices and they hate small matrices. mnist is so small that most of your computational cost is overhead. you can still train models in seconds on slightly harder datasets.
Which slightly more modern toy datasets come to mind?
I’ve heard good things about tiny-imagenet and fashion-mnist. even full imagenet is not that bad anymore with modern hardware.