The current SOTA models do very well (~90% accuracy) at few-shot learning tasks in the CIFAR-FS dataset [source], which has a comparable resolution to the images seen by bees, so I think that this task is quite solvable. Even bees and the models I discussed seem to do pretty well compared to chance.
Interesting to learn that compute figures can be brought down so much without accuracy loss! Could you point me to some reading material about this?
FWIW, GPT-4.5 is still available for Pro-tier users.