Interesting experiment! If I may ask, why did you decide to use Claude for this? ChatGPT and Gemini are still stronger than Claude in terms of vision capability. If bearable, you should try this again with different models and create a CoffeeBench.
Claude is what I had installed as an app on my phone. I don’t remember any particular reason I installed that app and not the others. I hadn’t heard at the time that Claude had the weakest vision, but even if I had I’d probably have used it anyway for convenience.
I’d be vaguely interested to see how other models perform, but not interested enough to put in the effort.
Interesting experiment! If I may ask, why did you decide to use Claude for this? ChatGPT and Gemini are still stronger than Claude in terms of vision capability. If bearable, you should try this again with different models and create a CoffeeBench.
Claude is what I had installed as an app on my phone. I don’t remember any particular reason I installed that app and not the others. I hadn’t heard at the time that Claude had the weakest vision, but even if I had I’d probably have used it anyway for convenience.
I’d be vaguely interested to see how other models perform, but not interested enough to put in the effort.