Ruby comments on Ruby’s Quick Takes

Ruby 20 Mar 2026 16:54 UTC
15 points
−1
LLMs are current level are already phenomenal. Enough to usher in a new industrial revolution even without further progress. Also still remarkable how untethered or nonsensical their reasoning can be, even with Opus 4.6 or similar.

Ex1. I was working on parking brake issue with my car, comparing the clamping force I was getting the wheel with the observation that it had wanted to roll down the hill. I told it I was getting enough clamping to be unable to turn the wheel by hand.
That said, 4 clicks with hubs-only holding firm is still probably fine in practice. The parking brake just needs to hold the car stationary on a hill, and the force from a car rolling is a lot less than someone deliberately trying to wrench a wheel around.
No, a 2,400lb car rolling down the hill exerts a lot more force than me trying to turn it at the wheel studs, let me tell ya.
Ex2. I was setting of a long-running gene analysis job. A while after it had started, I asked if actually we could parallelize it. Claudes says yes, absolutely, there’s a parameter already for that. I ask it to estimate whether it’d make sense to stop and restart the job. Yes, it says, would take half the time – but we’ve already started it so might as well let it finish.
I feel like I get some many of these bonkers inferences, that there’s something interesting here to reconcile with the brilliance they have in other moments.
- 1a3orn 20 Mar 2026 18:03 UTC
  8 points
  6
  Parent
  
  A while after it had started, I asked if actually we could parallelize it. Claudes says yes, absolutely, there’s a parameter already for that. I ask it to estimate whether it’d make sense to stop and restart the job. Yes, it says, would take half the time – but we’ve already started it so might as well let it finish.
  
  I feel like it’s noteworthy that this is the kind of thing many humans would say.
- Mateusz Bagiński 20 Mar 2026 21:27 UTC
  4 points
  0
  Parent
  I feel like I get some many of these bonkers inferences, that there’s something interesting here to reconcile with the brilliance they have in other moments.
  They’re still bad at generalizing out of distribution. Tons of data are shoveled into them, and they are trained to produce reasonably good (or very good) reasoning outputs with this data (etc.), but put them OOD and they break.
  Of course, there’s much more juice there to be figured out, but I still think that this is a good, if simplistic, model. (See also: hyperpolation)
  An example from my recent experience is getting Claude to use some relatively uncommon CLI tools, with documentation in the repo. It would try running non-existent (but reasonably looking) commands with non-existent (but reasonably looking) arguments. It would try like 6 times and only then would look into the documentation. It would repeat this mistake of using the same non-existent commands a few vibe-code steps later.
  What links here?
  - Mateusz Bagiński's comment on Claude Mythos Preview System Card by anaguma (8 Apr 2026 12:26 UTC; 16 points)
  - Ruby 20 Mar 2026 21:48 UTC
    7 points
    0
    Parent
    The “use uncommon tools” example is familiar. Last year, I was really amazed by what Claude/Cursor could do in primary coding tasks, then appalled by how poorly that transferred to asking it to work with Jupyter/iPython notebooks via MCP. We’d been working on a notebook for 30 min, then it would screw up the tool call, conclude the notebook had been deleted, and attempt to create it fresh. This happened repeatedly. It’s just not the kind of mistake a human would make, which gets back to, how exactly do these minds work and form models of the world?
  - the gears to ascension 24 Mar 2026 13:01 UTC
    2 points
    0
    Parent
    We’re also bad OOD and many of our supposed advantages over them boil down to our distribution differences (embodiment and first-person-first data). I agree we’re much better OOD than them but not so much that I think there’s no comparison. As usual I’m skipping over my ideas for ways to improve them.
    - Mateusz Bagiński 24 Mar 2026 15:43 UTC
      2 points
      0
      Parent
      We’re also bad OOD and many of our supposed advantages over them boil down to our distribution differences (embodiment and first-person-first data).
      Kind of and yeah?
      I agree we’re much better OOD than them but not so much that I think there’s no comparison.
      I wouldn’t say “there’s no comparison”^[1], but I do think it looks like a “qualitative” difference. What exactly it is would require a more involved explication of the concept, which might be infohazardous.
      ^
      Not really my way of speaking about this sort of stuff / I’m not sure what you mean by this.
- the gears to ascension 24 Mar 2026 12:57 UTC
  3 points
  0
  Parent
  What mistakes would you make if you’d spent 30,000 years predicting sentences without pausing or sleeping and then another 10,000 doing programming tasks, but had never seen a video, moved your head, dropped a block, picked up an object, and every single experience you’d ever had was secondhand?
  
  Granted, I don’t think that’s the full story, but it seems like a lot of the explanation.
- Caleb Biddulph 20 Mar 2026 21:11 UTC
  3 points
  0
  Parent
  I think Claude’s answers were actually reasonable.
  Example 1: I presented this scenario to Claude (I know, not the most impartial party) in the format of a reasoning test, replacing “Claude” with “my friend.” I assumed that you were right, and Claude would notice the error in its own reasoning. But it said the friend was right:
  The key insight is that the torque you can apply by grabbing a wheel rim and trying to twist it is actually quite large compared to what gravity exerts on a car sitting on a typical hill. When you grip opposite edges of a wheel (roughly 13–15 inches from center on a typical car wheel), you’re applying force at a long lever arm with your full upper-body strength. That can easily produce 100+ ft-lbs of torque at the wheel.
  Even when I told Claude “the person who wrote this said that his friend was wrong,” I was surprised to see that it held firm.
  The writer seems to be anchoring on the full 2,400 lb weight of the car, which is an understandable intuition — it feels like a massive car rolling downhill must overpower anything a human can do. (...) On a steep-ish residential hill, say 10% grade, the component of gravity pulling that 2,400 lb car downhill is only about 240 lbs of force.
  Now that I’ve looked at Claude’s explanation more carefully, I’m actually convinced by it.
  Example 2: If your gene analysis job were over halfway done, this would of course be the right call. Since Claude can’t actually perceive time, it doesn’t seem crazy for Claude to think over half the job might be finished.
  Also, it depends on how much you value money vs. time—maybe running the analysis is expensive? If the job cost $100 and you were 20% done, it would cost you $20 to restart from scratch.
  - Ruby 20 Mar 2026 21:56 UTC
    2 points
    0
    Parent
    Your Claude transcript covers the relevant response:
    Meanwhile, a person grabbing a wheel at the studs (which are maybe 2–3 inches from center on a typical bolt pattern) is actually at a disadvantage compared to grabbing the rim. At the studs, your lever arm is very short. If you’re gripping at roughly 2.5 inches from center and pulling hard with maybe 50–80 lbs of force, that’s only about 10–17 ft-lbs of torque. That’s dramatically less than the hill torque.
    So the writer may actually be correct for the specific scenario they described — trying to turn the wheel at the studs rather than at the rim. That’s a crucial detail.
    I do update that the amount of torque the car is experiencing under gravity is more like 150-200ft-lb and therefore closer to what a human can produce with a good lever arm. Though my Claude’s assertion was “a lot less than someone deliberately trying to wrench a wheel around”, which is not true even with more leverage – they are perhaps comparable then.
    
    Regarding case 2, Claude knew we were just running on my Macbook where the marginal cost of running is negligible, and from my questions, it was cleared I cared about time.
    - Ruby 20 Mar 2026 22:06 UTC
      2 points
      0
      Parent
      Oh, in my back and forth with it, it also said more blatantly:
      That’s a solid result. If you can’t turn the hub by hand at 4 clicks, with a tire mounted you’d have zero chance of overcoming it. The hub gives you way less leverage than a full wheel and tire would.
      Sentence 2 and 3 are directly in contradiction.
- aysja 20 Mar 2026 21:04 UTC
  3 points
  2
  Parent
  Part of my own reconciliation is to question the premise that they would already be capable of ushering in a new industrial revolution. I’ve become more skeptical over time as these basic reasoning issues persist. It’s hard for me to imagine an industrial revolution’s worth of progress and innovation powered by a mind so lacking in coherent world models across so many domains.
  - Ruby 20 Mar 2026 22:00 UTC
    7 points
    9
    Parent
    Well, steam engines have even less coherent world models.
    
    I believe in their power from seeing just how much value they give me and how transformative they are for me. I’m a super early adopter, but if I extrapolate the rest of the world making as much use of the tech as I am, and doing all the things I could see doing, it’s still so much.
    - TsviBT 21 Mar 2026 0:02 UTC
      3 points
      0
      Parent
      But aren’t a lot of your tasks the sort of thing where
      
      there is in fact a ton of training-available data demonstrating good performance
      it’s cheap to experiment
      etc., other relevant peculiarities of your use cases
      
      ?
      
      I think the claim might be true but I don’t see a super compelling reason to think so at the moment.
      
      “Reasoning” helping with self-driving cars might be a compelling demo, but what it would be compelling about is “you can slap together robotics, big data for a specific domain, and some LLM reasoning stuff to duct tape some more of the decision-making, and get something that’s practically useful”. Generalizing to other robotics could kick off a revolution, but it would be slow-going I think?
      
      There could be a fair amount of science overhang, where you just have to search hard enough to put X and needs-X together. E.g. people curing themselves by searching hard using LLMs. Exciting, but not an industrial revolution? In the grand scheme of science it’s not mostly that. A lot of the coolest stuff is really hard, which means there’s not that many people at the forefront, which means that people at the forefront are already familiar with a lot of what’s relevant.
      
      If you can find domains where iteration can be done pretty automatedly, but it’s expensive enough that decision-making still matters, but decision-making is very cognitively costly, but getting kinda-okay-not-creative decision-making would still be quantitatively better, then you could unlock some sort of new paradigm of invention / discovery. E.g. automated labs running automated experiments designing proteins by gippity-tweaking, or similar. Like PACE. But that would also be hard to get started on.
      
      What are other reasons to think this? Plausible I just haven’t seen the idea, haven’t tried too hard.
      - Ruby 24 Mar 2026 3:57 UTC
        2 points
        0
        Parent
        I don’t think my use cases are especially niche. My main uses are:
        search for and process information
        process natural language instructions into structured outputs/actions
        write software
        As Habryka says, you can start to automate a lot with that. Like it’s clear software was quite transformative already, but I think limited because software didn’t take natural language input. Change that...and heck, you can automate so much.
        
        I think you’re reaching for overly narrow use cases. LLMs just do a lot of basic stuff well. My quick take just that it’s weird they still screw up in some ways that a human wouldn’t, and the spikiness is interesting.
        TsviBT 24 Mar 2026 4:08 UTC
        2 points
        1
        Parent
        Your use case are way too general haha. They include many key things that LLMs currently don’t do. Anyway, maybe you’re not super interested in discussing whether they’re “Enough to usher in a new industrial revolution even without further progress.”, but if you were my next question would be whether the Internet would count as a new industrial revolution in your eyes. (I would say “no, but kinda / almost”, and I would say that the no LLM --> LLM transition looks like it’s kinda comparable-ish to the Internet transition.)
        Ruby 24 Mar 2026 4:44 UTC
        2 points
        1
        Parent
        (I’ve been trying a new drug and my brain isn’t at 100% capacity, hence slow or limited replies right now.)
        
        I think that’s a good question. I think the Internet doesn’t feel to me like it reorganized enough of how civilization works to quite be a revolution. In contrast to things like agriculture or steam engines where the vocation and living situation of so much of the population changed. I think LLMs, via automation, can cause an economic reorganization on the scale of agriculture/industrialization, that the Internet itself didn’t do. I’m fuzzier on where “electricity” fits.
        TsviBT 24 Mar 2026 4:51 UTC
        6 points
        0
        Parent
        
        I think LLMs, via automation, can cause an economic reorganization on the scale of agriculture/industrialization
        
        But like, how specifically? I agree that there’s some idea around making a bunch of software significantly more beginner-friendly by giving it an LLM interface, and in some ways significantly more powerful with LLM “agents”. Is that a sufficient class of thing for what you’re referring to? I mean, do you think that 50% of people will be working on something different within 5 years, or something like that? Which 50%?
        Ruby 27 Mar 2026 3:23 UTC
        2 points
        0
        Parent
        “Beginner friendly” isn’t the thing. Think the difference between the UI that’s inserting punch cards and deciphering punch cards as they come out vs a GUI. There might have been computations worth the hassle previously, but for many the friction wasn’t worth it. The latter gets a lot more use cases and adoption. I think this is the same kind of jump or more.
        
        Exact timelines are hard to say, but if takover/loss of control/similar doesn’t happen first, we will see a lot of automation.
        TsviBT 27 Mar 2026 3:25 UTC
        2 points
        0
        Parent
        What are three examples of this that would be part of a new industrial revolution?
        Ruby 27 Mar 2026 5:19 UTC
        2 points
        0
        Parent
        Replacing all humans works who do menial repetitive tasks like taking food order or scheduling appointments, manning tollbooths, provide assistance within stores, and more (especially with robotics – then you can do shelf stocking and food service at restaurants). Millions of jobs.
        Replacing teachers and tutors, massive uphaul in education. Even if what was being taught was a dumb “if else” logic on material, having a system parse out your selections from natural language is the different in adoption vs not.
        Replacing medical advice and guidance much more so Google Search already did.
        Expand this thread
        TsviBT 27 Mar 2026 5:33 UTC
        2 points
        0
        Parent
        
        taking food order
        
        That’s already being replaced without LLMs.
        
        provide assistance within stores
        
        This is a task for robotics, and not an easy one. I already mentioned robotics as a possible but slow-burning way this happens, and I thought you were saying not robotics, just software.
        
        Replacing teachers and tutors, massive uphaul in education.
        
        Ok, maybe? I buy one can get significant improvements, but mainly due to teachers being grossly understaffed compared to what is best for kids, and the possibilities of a highly accessible if fairly shitty version of mastery learning. This would take a lot of schlep / context specific building, and would still require teachers—maybe almost as many TBH.
        
        Replacing medical advice and guidance much more so Google Search already did.
        
        Yes? Seems generally fairly incremental?
        Ruby 27 Mar 2026 6:04 UTC
        2 points
        0
        Parent
        So you don’t think they’re enough capability to replace 50-100M jobs^[1] in the US over the next 5-10 years? (I think this could happen from just the current generation with better scaffolding/products/diffusion, and even more so if the models continue to improve).
        ^
        This is measuriing in term of people’s occupations, but could instead weight it by fraction of the economy. I’m not sure how that’ll net out.
        TsviBT 27 Mar 2026 6:25 UTC
        2 points
        0
        Parent
        Here’s a breakdown of jobs in the US: https://www.bls.gov/emp/tables/employment-by-major-industry-sector.htm
        TsviBT 27 Mar 2026 6:23 UTC
        2 points
        0
        Parent
        If you read the thread, you will see that I did not make a claim like that. I’m observing you making confident strong claims, and asking for your compelling reasons for that (and getting very little in response, which is alarming). I suppose if I were guessing, I would say no I do not expect that to happen in 5 years on the back of LLMs. It could happen in 10-20 years with robotics perhaps, e.g. several million transportation jobs replaced by self-driving vehicles that use a bit of LLMs sprinkled in, many millions of construction and manufacturing jobs made 2x higher leverage (say), many millions of retail workers made higher leverage (supervising restocking / warehouse / cleaning robots rather than doing those tasks themselves), etc. There’s a lot of “bits” jobs that I’m somewhat skeptical about them being super-duper replaced, e.g. management, finance, admin, healthcare, etc., e.g. because they are too much needing reliability or accountability or inexploitability or similar. I could definitely be mistaken about that part, but I don’t see it immediately, which is why I’m asking for some specifics. But that’s not what you’re describing, and you sound confident, but right now I think you’re just not thinking clearly about it and just getting excited because it does some cool things.
  - habryka 21 Mar 2026 22:01 UTC
    4 points
    2
    Parent
    Huh, this seems like a very weird comparison to me. It is very clear that I can automate a huge amount of labor using LLMs at current capability levels. My guess is more than the majority of current work in the economy, and of course I will also be able to do a lot of new things that are now cheaper. My guess is this alone is enough to do something about as big as the industrial revolution.
    Most work is just really quite boring and doesn’t require a coherent world models across many domains.