I have one query. How much better is it possible to do on this task? It bothers me that by stripping resolution, and giving the task to a being that only knows these training examples, it may simply not be very solvable, making these low accuracies due algorithms barely better than chance.
Also note that resnet-12 - or other variants of resnet—there exist numerous techniques for cutting down the computational requirements by at least an order of magnitude with minimal accuracy loss.
The current SOTA models do very well (~90% accuracy) at few-shot learning tasks in the CIFAR-FS dataset [source], which has a comparable resolution to the images seen by bees, so I think that this task is quite solvable. Even bees and the models I discussed seem to do pretty well compared to chance.
Interesting to learn that compute figures can be brought down so much without accuracy loss! Could you point me to some reading material about this?
Two methods I have personally used:
quantization to int-8
A third way is “sparse” networks—many of the weights end up being near zero, and you can simply neglect those, but you need your hardware to support sparse matrix convolution.
All of these methods have the tradeoff of a small decrease in accuracy for a large decrease in required compute.
And my point about “solvability” is that there is a certain amount of noise—entropy—in the images, such that a perfect classifier trained only on the image set, with infinite compute and the global maximumally performing model, still cannot reach 100%. As the finite set doesn’t have enough information. (and no, you cannot deduce the ‘seed’ of our universe and play forward until that moment as you do not have enough information to do that, even with infinite compute, at least if your only information input is the image set. You would find too many other universes that match the conditions. Human beings trying to manually solve the image aren’t a fair comparison because they are bringing in outside information that wasn’t in the set)
So there is some true ceiling for any regression problem, and you would actually expect that a ‘good’ modern method might be acceptably close to the ceiling, or get there soon. (if the ‘true ceiling’ is 97% accuracy a model that is 95% is good enough for engineering purposes)
Or a simple example : for a mostly fair coin, you cannot infer the future outcome of a flip better than the bias of the coin itself.