I think a moderately-skilled person could outperform Claude here, but it’s closer than you might think. Have you thought of running this experiment with a human on the other end?
I occasionally give technical support for industrial automation equipment, and I feel for Claude. It’s so much harder than it looks, even when you have voice+video instead of text+pictures.
As one example of how it can go wrong, I said “Check the cables on the enclosure, and make sure they’re all connected properly.” instead of “Check the three cables on the enclosure (Power, ethernet, remote sensor module), and make sure each of them are connected properly.” and it took us 20 minutes to figure out that the reason it couldn’t communicate with the network is because the ethernet cable was completely missing.
There are hundreds of videos about the difficulty of giving precise directions, usually played for comedy. For example, here:
I think a moderately-skilled person could outperform Claude here, but it’s closer than you might think. Have you thought of running this experiment with a human on the other end?
Seconded, especially because the very low temporal resolution of intermittent still photos is a real challenge. I’m potentially open to playing as either LLM or robot here — the natural thing to do would be for @philh to play the robot again for consistency, but it would have to be either a different flat (ideal) or a different task (harder to compare). It might be better to have a Brit play the LLM, though; I certainly wouldn’t (for example) have recognized ‘Douwe Egberts’ as coffee. For fair comparison, I think it would be important to do it over text rather than a voice / video call.
I feel like Claude didn’t get tripped up here by not providing precise enough instructions, or the op not giving the instructions the benefit of the doubt enough times.
If OP is trying to simulate a capable robot which Claude controls, then I think the benefit of the doubt should be pretty much non-existant. Even asking clarifying questions etc should be out in my opinion.
Overall agreed, but I note that the video (which I enjoyed) is a significantly different challenge—when the dad starts sliding the bread around the table with his knife, he doesn’t give the kid a chance to say “ah, I see the problem! You need to dip the knife in the peanut butter...”
I think a moderately-skilled person could outperform Claude here, but it’s closer than you might think. Have you thought of running this experiment with a human on the other end?
I occasionally give technical support for industrial automation equipment, and I feel for Claude. It’s so much harder than it looks, even when you have voice+video instead of text+pictures.
As one example of how it can go wrong, I said “Check the cables on the enclosure, and make sure they’re all connected properly.” instead of “Check the three cables on the enclosure (Power, ethernet, remote sensor module), and make sure each of them are connected properly.” and it took us 20 minutes to figure out that the reason it couldn’t communicate with the network is because the ethernet cable was completely missing.
There are hundreds of videos about the difficulty of giving precise directions, usually played for comedy. For example, here:
Seconded, especially because the very low temporal resolution of intermittent still photos is a real challenge. I’m potentially open to playing as either LLM or robot here — the natural thing to do would be for @philh to play the robot again for consistency, but it would have to be either a different flat (ideal) or a different task (harder to compare). It might be better to have a Brit play the LLM, though; I certainly wouldn’t (for example) have recognized ‘Douwe Egberts’ as coffee. For fair comparison, I think it would be important to do it over text rather than a voice / video call.
I feel like Claude didn’t get tripped up here by not providing precise enough instructions, or the op not giving the instructions the benefit of the doubt enough times.
If OP is trying to simulate a capable robot which Claude controls, then I think the benefit of the doubt should be pretty much non-existant. Even asking clarifying questions etc should be out in my opinion.
Overall agreed, but I note that the video (which I enjoyed) is a significantly different challenge—when the dad starts sliding the bread around the table with his knife, he doesn’t give the kid a chance to say “ah, I see the problem! You need to dip the knife in the peanut butter...”