You asked for predictions at the start, so here was mine:
“I expect it to have some difficulty recognising everything in the pictures, and to miss approximately 1 step in the process (like not actually turning the kettle on). Ultimately I expect it to succeed in a sub-par way. 90% chance.”
It did worse than I predicted. The image recognition was significantly worse than I imagined, and Claude had to be helped along at most stages of the process. The transcript reads like someone with vision problems trying to guide you. Claude was mostly ok in terms of creating a series of actions to take for the actual act of making coffee, but had a poor sense of where everything was and what everything was. It was trying to delegate the process of finding stuff via commands like “look through the drawers to find x”.
Is it just an image problem? I would have been more impressed if Claude had noticed the fact that it was making mistakes when recognising objects, and changed strategy to ask you to take more photos from more angles. That’s what I would have tried if I were a vision-impaired controller in this scenario. None the less, I would be interested if you coupled claude with an ai model specifically for image recognition, where your process would be to pass the photos to the image recogniser, then take the image description produced and pass that back to claude.
You asked for predictions at the start, so here was mine:
“I expect it to have some difficulty recognising everything in the pictures, and to miss approximately 1 step in the process (like not actually turning the kettle on). Ultimately I expect it to succeed in a sub-par way. 90% chance.”
It did worse than I predicted.
The image recognition was significantly worse than I imagined, and Claude had to be helped along at most stages of the process. The transcript reads like someone with vision problems trying to guide you. Claude was mostly ok in terms of creating a series of actions to take for the actual act of making coffee, but had a poor sense of where everything was and what everything was. It was trying to delegate the process of finding stuff via commands like “look through the drawers to find x”.
Is it just an image problem? I would have been more impressed if Claude had noticed the fact that it was making mistakes when recognising objects, and changed strategy to ask you to take more photos from more angles. That’s what I would have tried if I were a vision-impaired controller in this scenario. None the less, I would be interested if you coupled claude with an ai model specifically for image recognition, where your process would be to pass the photos to the image recogniser, then take the image description produced and pass that back to claude.