Left a long comment that largely agrees, but I want to add: Do not just passively consume data! Working with a huge unlabeled dataset can be useful when you know nothing about the structure of the data, but you do know something about the structure of your data. You don’t have to understand fashion that well to build a good curriculum, or to give yourself a better architecture, or to add some side terms to your loss function. As with real neural networks, if you know anything at all about your domain, then you will get faster results by incorporating that knowledge into your training structure.
Left a long comment that largely agrees, but I want to add: Do not just passively consume data! Working with a huge unlabeled dataset can be useful when you know nothing about the structure of the data, but you do know something about the structure of your data. You don’t have to understand fashion that well to build a good curriculum, or to give yourself a better architecture, or to add some side terms to your loss function. As with real neural networks, if you know anything at all about your domain, then you will get faster results by incorporating that knowledge into your training structure.