Thane Ruthenis comments on Thane Ruthenis’s Shortform

Thane Ruthenis 10 Feb 2026 19:35 UTC
59 points
13
Model to track: You get 80% of the current max value LLMs could provide you from standard-issue chat models and any decent out-of-the-box coding agent, both prompted the obvious way. Trying to get the remaining 20% that are locked behind figuring out agent swarms, optimizing your prompts, setting up ad-hoc continuous-memory setups, doing comparative analyses of different frontier models’ performance on your tasks, inventing new galaxy-brained workflows, writing custom software, et cetera, would not be worth it: it would take too long for too little payoff.
There is an “LLMs for productivity!” memeplex that is trying to turn people into its hosts by fostering FOMO in those who are not investing tons of their time into tinkering with LLMs. You should ignore it. At best it would waste your time; at worst it would corrupt your priorities, convincing you that you should reorient your life around “optimizing your Claude Code setup” or writing productivity apps for yourself. LW regulars may be especially vulnerable to it: we know that AI is going to become absurdly powerful sooner or later, so it takes relatively little to sell to us the idea that it already is absurdly powerful – which may or may not be currently being exploited by analogues of crypto grifters.
(Not to say you mustn’t be tinkering with LLMs and vibe-coding custom software, especially if you’re having fun! But you should perhaps approach it in the spirit of a hobby, rather than the thing you should be doing.)
Well, at least, that’s my takeaway from watching the current ideatic ecosystem around LLMs and trying that stuff for myself (one, two, three). I do have tons of ideas about custom software that perhaps could 1.1x my productivity… but it’s too complex for the LLMs of today to vibe-code in a truly hands-off manner, and is not worth the time otherwise. Maybe in six more months.
Obviously “reverse any advice you hear” and “Thane has terminal skill issues and this post is sour grapes” may or may not apply. (Though, of course, “you have skill issues if you haven’t figured out how to 10x your productivity using LLMs, you must keep trying or you’ll be left behind in the permanent underclass!!!” is the standard recruitment pitch of the aforementioned memeplex.)
- habryka 10 Feb 2026 20:04 UTC
  15 points
  0
  Parent
  I think I directionally disagree with this for most people? My guess is the average person on LW should be spending around 10 hours a week trying to figure out how to automate themselves or other parts of their job using LLMs. It seems to me to be where most of the edge is in terms of increasing productivity and impact for most people (though of course not everyone).
  - Thane Ruthenis 10 Feb 2026 20:43 UTC
    8 points
    0
    Parent
    Well, depends on the job, I suppose. I did read your post on the topic, and I’m guessing it indeed makes much more sense in the context of automating parts of a company, with lots of time-consuming but boilerplate-y tasks.
    As someone doing math/conceptual research, I don’t currently see much potential there. I can imagine stuff that would be useful for me, e. g.:
    Systems that would reduce the time needed to assemble the context for getting LLMs’ help with research/brainstorming tasks.
    Systems that would remove the friction in getting LLMs’ assistance with math proofs.
    Pipelines for quickly extracting insights from papers en masse.
    A custom analogue of OpenAI’s Pulse where an LLM swarm’s context is updated with my latest thoughts regarding what I’m working on and it asynchronously searches the literature ²⁴⁄₇ in search of anything helpful.
    Some sort of “exploratory medium for mathematics”.
    But none of this would be an equivalent of even a 10h/week productivity boost, I don’t think.
    To clarify, being able to speed-read a paper with an LLM or do a literature review using a Deep Research feature is very helpful for me. But this is the “80% of the value that you can get just by using the out-of-the-box tools the obvious way” I was talking about. Stuff on top of that mostly isn’t worth it.
    IMO, the correct approach for most people is more along the lines of “try to be passively aware that LLMs exist now, and be constantly on the lookout for things where they could be easily applied for significant benefits”, rather than “spend N hours/week integrating them into your workflows in nontrivial-to-implement ways”.
    - Steven Byrnes 10 Feb 2026 21:47 UTC
      11 points
      4
      Parent
      FWIW, inspired by Justis, I’ve been keeping up a list of things that I could usefully automate with Claude Code (or similar) for my own personal productivity, adding to the list every time something pops into my head. I’ve been adding to the list for the past three weeks. But so far it’s a very underwhelming list! Here’s ~the whole thing:
      Custom interface for composing tweet-threads, including their funny formula for counting characters (I have some complaints about the built-in twitter one, e.g. I usually also post them onto bluesky)
      Jeff’s “clipboard normalizer” (but I have a PC not Mac)
      …And something similar for clipboard conversion from simple HTML into the abstruse “typst” format that I was using a few weeks ago for a particular project.
      One-click way to move certain things to my Trello to-do list, e.g.
      LessWrong notifications
      Interesting-looking papers or links to read from social media (twitter, slack, discord)
      Emails
      Anyway, all of these seem like they would save me a pathetically small amount of time, and so I haven’t bothered to install Claude Code yet. But someday the list will be longer, or I will be bored and curious enough to do it regardless.
      Meanwhile, I 80/20’d the second one (clipboard normalizer) just using a normal LLM chat interface: Gemini one-shotted a nice HTML + javascript solution that I stored locally and bookmarked. It adds an extra couple seconds compared to an app or chrome extension, but whatever, I don’t use it that often anyway.
      I’ll keep brainstorming, but I dunno, I really don’t seem to do much that can be automated at all, and that I haven’t already automated years ago in the old-fashioned way (e.g. I have long had automatic file backups, automatic credit card payments, automatic bank transfers, automatic citation downloading, etc.)
      What links here?
      Thane Ruthenis's comment on METR’s 14h 50% Horizon Impacts The Economy More Than ASI Timelines by Michaël Trazzi (21 Feb 2026 7:42 UTC; 24 points)
    - habryka 10 Feb 2026 21:05 UTC
      9 points
      1
      Parent
      But none of this would be an equivalent of even a 10h/week productivity boost, I don’t think.
      To be clear, I think it’s worth spending 10h/week even if you expect to get less than 10h/week in productivity boost right now, because it does take a while to get good at using these systems, and my guess is there will be a future where these things will be very helpful for almost everyone, and skill will translate non-trivially.
      spend N hours/week integrating them into your workflows in nontrivial-to-implement ways
      I currently disagree. In my experience you do actually experience substantial downlift for a while, and it is worth getting good at having that not happen to you.
      - Thane Ruthenis 10 Feb 2026 21:29 UTC
        12 points
        4
        Parent
        I think it’s worth spending 10h/week even if you expect to get less than 10h/week in productivity boost right now, because it does take a while to get good at using these systems
        I am aware of this argument. Counterpoint: models get increasingly easier to use as they get more powerful – better at inferring your intent, not subject to entire classes of failure modes plaguing earlier generations, etc. – so the skills you’ll learn by painstakingly wrangling current LLMs will end up obsoleted by subsequent generation.
        Like, inasmuch as one buys that LLMs are on the trajectory to becoming absurdly powerful, one should not expect to need to develop intricate skillsets for squeezing value out of them. You’re not gonna need to prompt-engineer AGIs and invent custom scaffolds for them, they will build the scaffolds for themselves and your cleverest prompts will be as effective as “just talk to them the obvious way”. (Same for ad-hoc continuous-memory setups and context-management hacks et cetera: if the AGI labs crack architectural continuous learning, it’ll all be obsoleted overnight.)
        On the other hand, inasmuch as you don’t believe that LLMs are going to be getting increasingly easier to use, you essentially don’t believe that they’re on the trajectory to become absurdly powerful AGIs. If so, you should downgrade your expectation of how much value their future generations will bring you, and accordingly downgrade how much you should be investing in them now.
        Oh, by the way: I saw you saying that you’re observing much more software downstream of LLMs. Any chance you can elaborate on that, provide some examples? This is the sort of thing I’m very interested in tracking, and high-quality information sources are hard to come by.
        habryka 10 Feb 2026 21:53 UTC
        7 points
        0
        Parent
        It’s clear to me that the product velocity of things like Cursor, Claude Code and Codex is much higher than I’ve seen for basically any other product. This is what I meant by saying most of the software I’ve seen has been for software developers themselves.
        We are now starting to see this trickle out. Internally at Lightcone more of my staff can now build software solutions to problems where they previously needed support from a software engineer (a random example of this is building Airtable automations with script blocks). My guess is if you surveyed Hacker News you would also see that more things on there are small applications that someone built that previously would have taken prohibitively long to build. This is a random example of one such project: https://www.ismypubfucked.com/
        Ben Pace 10 Feb 2026 21:44 UTC
        5 points
        0
        Parent
        The improvements in thinking quality of the models doesn’t address one of the main causes of downlift, which is the breaking up of deep work by regularly (and sometimes surprisingly) having 1-10 min periods where you are no longer able to do productive work because the LLM is executing a task, and so you lose cognitive context, and tend toward shallower decision-making. This is something that continues to plague me, often causing me to waste a lot of time (both in the individual chunks and when summing my decision-making over a day).
        Thane Ruthenis 10 Feb 2026 22:02 UTC
        3 points
        0
        Parent
        Not convinced this isn’t a temporary artefact of the current time horizons. Like, in the future, I think it’s plausible that the two categories of tasks you’d be delegating would be either (a) the sort of shallow tasks the future models would be able to complete instantly, or (b) the sort of deep tasks that’d take future models hours to complete.
        Fair enough, though, maybe this counts. But is there really a rich suite of skills like that, and would they really take that long to learn by the time learning them does become immediately net-positive?
        Ben Pace 10 Feb 2026 22:07 UTC
        7 points
        0
        Parent
        I think it’s fairly likely I need to re-orient my entire workflow around constantly (but somewhat surprisingly) having heavy-tail distributions of time where I can’t do productive work on my main work. This is not a small deal. I suspect that many people will deal with it very differently.
        Here are some possible responses:
        Build a practice of having multiple parallel LLM projects you can work on simultaneously (I have not found this cognitively trivial)
        Build up a backlog of simple low-context tasks you can do, and figure out how to turn your lower-importance work into that kind of task
        Learn how to identify tasks that aren’t worth it because of the downlift, even though you know an AI could do it.
        The first two really sound quite complex, and the third sounds genuinely hard. I suspect other people will find other solutions...
  - Viliam 12 Feb 2026 11:13 UTC
    2 points
    0
    Parent
    My guess is the average person on LW should be spending around 10 hours a week trying to figure out how to automate themselves or other parts of their job using LLMs.
    Yeah. I am nowhere near doing this systematically, but I noticed that whatever I am doing, it makes sense to ask “could I use an LLM to help me with this?” That includes even things like reading Reddit—now the LLM could read it for me, and just give me a summary. (I haven’t tried this yet.)
    It is even worth revisiting the old (pre-LLM) question of “could I automate this using a shell/Python script?”, because LLM makes creating such scripts much cheaper.
    Like, if in the past the balance was like “it takes me one hour to do it by hand, and it would also take nontrivial time to write the script, plus I might find out in the middle of it that the situation is more difficult than I thought and there are some exceptions, or I might end up exploring some rabbit hole… so all things considered it’s probably faster doing it by hand”, these days making the script sometimes only takes as much time as you need to verbally describe the intended functionality.
- Elizabeth 10 Feb 2026 22:27 UTC
  11 points
  3
  Parent
  datapoint: this was my exact argument for not learning to vibecode (after working as a programmer for 10 years and quitting 9 years ago). Last month was when (I noticed that) vibecoding (had) crossed the threshold where it quickly paid off the time I put into it, and that was with private tutoring from someone who’d been on the cutting edge for >1 a year.
  I’m not sure if this supports your argument (because I do think any time I put into learning to vibecode before the recent transition would have been wasted) or counters (because this is the month things transitioned).
- Raemon 11 Feb 2026 0:36 UTC
  7 points
  1
  Parent
  I lean towards this, despite being a guy currently heavily invested in AI tools.
  Cluster of things that all seem true to me:
  - I am 100% addicted to vibecoding in a straightforward “this is fun and dopamine inducing way”, which is making it hard to reason about.
  - As fun hobbies go, it does seem fine, all else equal
  - You’re broadly right about the 80%/20% rule, and microoptimizations not being worth it.
  But:
  - There are some infrastructure that probably will make sense to have as the AIs scale up, which AI companies either won’t provide, or, you probably don’t want to trust. (This is a bet, I’m not that confident)
  - Learning to wield AI is going to be important (for at least many people. I think it’s more straightforwardly important for software engineers than theoretic researchers).
  - Some of that infrastructure is stuff that AI can basically one shot. I think the skill/habit to cultivate here is “check if it 1 or 2 shots it. If not, bail.”
  - Thane Ruthenis 12 Feb 2026 20:14 UTC
    2 points
    0
    Parent
    I am 100% addicted to vibecoding in a straightforward “this is fun and dopamine inducing way”
    I’m curious, how does that work? What mindset are you approaching it from? What sorts of projects (in terms of their… emotional felt-sense, I guess) are you attempting with it?
    I think I would like to be able to engage with it as with a hobby, but it’s not been fun for me.
    - Raemon 12 Feb 2026 20:48 UTC
      6 points
      0
      Parent
      For me it’s like “I type some quick stuff in, and then, like, agency comes out and I get to see stuff get built, and it works great 20% of the time, okay 60%, and fails 20% of the time, but, that produces a kinda skinner-box slot machine element to it.” (to be clear I think the skinner-box bit is bad, the “stuff comes out with little effort” part is great. It’s like jamming with a partner who can do most of all the tedious parts of the work)
      My impression from your other posts is that you are mostly just getting a much worse hit rate (because yeah if it’s not really set up to excel in a domain, it’s a lot less workable)
      - Thane Ruthenis 12 Feb 2026 21:17 UTC
        2 points
        0
        Parent
        Thanks!
        My impression from your other posts is that you are mostly just getting a much worse hit rate
        No, the hit rate sounds mostly similar. I think it’s more that I may have unusually strong anti-gacha instincts? Like, if I’m doing something, momentarily reflect on it, and recognize that it’s equivalent to playing a slot machine, this immediately causes negative feelings in me and sours the whole experience. Which I guess is usually a good adaptation to have, but may or may not be be anti-helpful in this specific case.
- SatvikBeri 10 Feb 2026 23:52 UTC
  5 points
  0
  Parent
  I think this is incorrect, but that agent swarms etc. are mostly not helpful, and that the large productivity boosts are specific to domains or situations.
  Two from my side: Claude Code got much better once I got it successfully working with a REPL (which made the feedback loop much faster, let me inspect the outputs etc.) and once I wrote up a fair bit of documentation on how to use our custom framework.
  Edit: I forgot that not everyone works in software. I am much less confident that this applies in other domains today.
- Damin Niohe 15 Feb 2026 13:43 UTC
  3 points
  0
  Parent
  An interesting piece of potential evidence in favor of this is that METR time horizons measurements didn’t vary significantly for ChatGPT and Claude models when using a basic scaffold as compared to the specific Claude Code and Codex harnesses.
  https://metr.org/notes/2026-02-13-measuring-time-horizon-using-claude-code-and-codex/
- romeostevensit 10 Feb 2026 21:07 UTC
  2 points
  0
  Parent
  two useful things:
  
  putting things in the new larger context windows, like the books of authors you respect and having the viewpoints discuss things back and forth between several authors. Helps avoid the powerpoint slop attractor.
  
  learning to prompt better via the dual use of practicing good business writing techniques. Easy to do via the above by putting a couple business writing books in context and then prompting the model to give you exercises that it then grades you on.