I think Claude’s answers were actually reasonable.
Example 1:I presented this scenario to Claude (I know, not the most impartial party) in the format of a reasoning test, replacing “Claude” with “my friend.” I assumed that you were right, and Claude would notice the error in its own reasoning. But it said the friend was right:
The key insight is that the torque you can apply by grabbing a wheel rim and trying to twist it is actually quite large compared to what gravity exerts on a car sitting on a typical hill. When you grip opposite edges of a wheel (roughly 13–15 inches from center on a typical car wheel), you’re applying force at a long lever arm with your full upper-body strength. That can easily produce 100+ ft-lbs of torque at the wheel.
Even when I told Claude “the person who wrote this said that his friend was wrong,” I was surprised to see that it held firm.
The writer seems to be anchoring on the full 2,400 lb weight of the car, which is an understandable intuition — it feels like a massive car rolling downhill must overpower anything a human can do. (...) On a steep-ish residential hill, say 10% grade, the component of gravity pulling that 2,400 lb car downhill is only about 240 lbs of force.
Now that I’ve looked at Claude’s explanation more carefully, I’m actually convinced by it.
Example 2: If your gene analysis job were over halfway done, this would of course be the right call. Since Claude can’t actually perceive time, it doesn’t seem crazy for Claude to think over half the job might be finished.
Also, it depends on how much you value money vs. time—maybe running the analysis is expensive? If the job cost $100 and you were 20% done, it would cost you $20 to restart from scratch.
Your Claude transcript covers the relevant response:
Meanwhile, a person grabbing a wheel at the studs (which are maybe 2–3 inches from center on a typical bolt pattern) is actually at a disadvantage compared to grabbing the rim. At the studs, your lever arm is very short. If you’re gripping at roughly 2.5 inches from center and pulling hard with maybe 50–80 lbs of force, that’s only about 10–17 ft-lbs of torque. That’s dramatically less than the hill torque.
So the writer may actually be correct for the specific scenario they described — trying to turn the wheel at the studs rather than at the rim. That’s a crucial detail.
I do update that the amount of torque the car is experiencing under gravity is more like 150-200ft-lb and therefore closer to what a human can produce with a good lever arm. Though my Claude’s assertion was “a lot less than someone deliberately trying to wrench a wheel around”, which is not true even with more leverage – they are perhaps comparable then.
Regarding case 2, Claude knew we were just running on my Macbook where the marginal cost of running is negligible, and from my questions, it was cleared I cared about time.
Oh, in my back and forth with it, it also said more blatantly:
That’s a solid result. If you can’t turn the hub by hand at 4 clicks, with a tire mounted you’d have zero chance of overcoming it. The hub gives you way less leverage than a full wheel and tire would.
I think Claude’s answers were actually reasonable.
Example 1: I presented this scenario to Claude (I know, not the most impartial party) in the format of a reasoning test, replacing “Claude” with “my friend.” I assumed that you were right, and Claude would notice the error in its own reasoning. But it said the friend was right:
Even when I told Claude “the person who wrote this said that his friend was wrong,” I was surprised to see that it held firm.
Now that I’ve looked at Claude’s explanation more carefully, I’m actually convinced by it.
Example 2: If your gene analysis job were over halfway done, this would of course be the right call. Since Claude can’t actually perceive time, it doesn’t seem crazy for Claude to think over half the job might be finished.
Also, it depends on how much you value money vs. time—maybe running the analysis is expensive? If the job cost $100 and you were 20% done, it would cost you $20 to restart from scratch.
Your Claude transcript covers the relevant response:
I do update that the amount of torque the car is experiencing under gravity is more like 150-200ft-lb and therefore closer to what a human can produce with a good lever arm. Though my Claude’s assertion was “a lot less than someone deliberately trying to wrench a wheel around”, which is not true even with more leverage – they are perhaps comparable then.
Regarding case 2, Claude knew we were just running on my Macbook where the marginal cost of running is negligible, and from my questions, it was cleared I cared about time.
Oh, in my back and forth with it, it also said more blatantly:
Sentence 2 and 3 are directly in contradiction.