Claude Code has radically changed what it means for me to be a programmer. It’s made me much more productive. I’m able to get work done in hours that would have previously taken me days. On the surface it probably looks like I’m only about 1.5x more productive in terms of time to deliver results, but when you factor in that the quality of the code I can write in the same amount of time is now much higher because coding agents make it cheaper to put in the effort to write high quality code, it’s easily a 3x-5x increase in total productivity, assuming you agree with me that quality code is valuable.
The natural questions to ask are “how do I do it?” and “can you do it, too?”
Let’s tackle these questions in order.
The Mechanics of 5x Engineering
Arguably, the way I achieve 5x productivity at programming is pretty simple: I use Claude Code to do almost everything!
But the reality is more nuanced, because lots of people are also using Claude Code to do almost everything and aren’t seeing similar productivity gains. I’ve learned a lot about how to work effectively with Claude in the 6+ months since its release. Here’s how I do it:
At a high level, I think of myself as pair programming with Claude. I’m the navigator and Claude is the driver, though occasionally I ask Claude to help me navigate, as during planning, and I sometimes do a little driving in the form of jumping in to change small things in code.
I find this model superior to other ways of trying to use coding agents. In particular, I dislike letting agents run and submit a PR. I don’t trust the changes work, even if CI passes, and the iteration time is longer. Rather than have my attention repeatedly pulled in multiple directions, I want to work on one thing at a time, get it done, and then move on to the next one, just faster than if I was coding without an LLM.
I almost always start with a plan, as in using planning mode. I give Claude some vague instructions about what I want and let it get familiar with the code base. It will come back with something. I then iterate with it on the plan until it looks basically right.
The most valuable thing I have that Claude doesn’t is context—context about the business, our users, their needs, how a feature should work, and, to a lesser extent, our engineering culture (it’s able to infer a lot about this from our code!). This is the most important stuff for me to tell Claude about when it’s planning.
If I’m working on something big or complex, I tell Claude I want to break the work into phases because I want to stop after each phase to create a PR in a stack of PRs. This helps keep it from getting lost because it’s trying to do too much, and because I actually do create a stack of PRs, which makes life easier come review time.
Sometimes I’m not sure if the initial plan Claude comes up with is good, so I ask questions. Claude can be a little overeager to make new plans, but if I keep pressing, it will explain its decisions and, if I think they are bad, I can work with Claude to come up with a better plan.
Speaking of which, I often dislike the initial plan. Sometimes I have an idea in mind of what we should do instead, but I will often feign ignorance and ask Claude for alternatives to avoid biasing it. Sometimes it comes up with a better idea than me, but usually it either comes up with the idea I had or makes a suggestion that helps me make my own idea better (and then I tell Claude to change the plan to do that).
Once the plan is ready, I have Claude execute it. During this part I usually go do something else while it works. This might be working on another coding task, dealing with email or other messages, or catching up on Twitter if I can’t hold the context of two complex things in my head at once.
When Claude is done, I carefully review its work. I trust nothing. I look over the diff and make sure I understand every part of it. If anything seems weird or I don’t understand something, I ask Claude what’s up with that part of the code.
Claude is biased towards action, so when asking questions, I often need to remind it that I’m just asking, not telling it to do anything yet. I’ll say something like don’t make any changes yet, but why is [some part of the code] like [whatever it’s like]?
I also ask Claude to plan changes before making them. This helps avoid a lot of rework and waiting for it to make changes that I’m going to undo. I sometimes go as far as putting Claude back into planning mode, but usually I just do planning in normal mode by being explicit with a prompt like don’t change anything yet. look at [part of the code] and give me options for how we might [make it better in some specific way]
I treat this process like a mini planning session, even if Claude isn’t in plan mode. So I ask lots of questions, try to understand tradeoffs, and come to a clear conclusion on what to do next before letting Claude write more code.
I’ll also sometimes go in and make edits. I can’t consistently get Claude to match my style preferences even with fairly aggressive prompting in CLAUDE.md, so I’ll go in and fix it manually. These are always small changes, though. For anything that’s going to take me more than a couple minutes I get Claude to do it.
When I do make edits, I always make sure to tell Claude I did this. Otherwise it’s liable to undo my changes. For this reason I often try to leave style edits until the very end, after I expect to make no more edits with Claude.
I find it useful to commit code as I go. This effectively gives me checkpoints in case Claude goes off the rails and does something weird that’s hard to recover from. I’ve regularly been saved from manual cleanup by being able to execute a quick git checkout .
While I usually go do something else while Claude works, sometimes I find it useful to sit and watch it. I especially find this useful when I have low confidence that it will do the right thing, or when I’ve seen it wasting time doing extra steps after making code changes. Then I can interrupt it and correct course or stop it early and get on to the next thing.
I always test Claude’s changes, just like I’d test changes if I made them directly. This should be obvious, but I own the code, not Claude, so it’s up to me to test it and make sure it works.
That doesn’t mean I can’t ask Claude for help, in particular to write automated tests. I’ll usually give it a vague description of what I want tested and what cases I care about, and it will code them up.
The tests Claude produces aren’t perfect, but writing tests is tedious, and I otherwise don’t write enough of them. Claude reduces the friction enough that I produce a larger number of useful tests in my code, even if those tests don’t meet the same quality bar I set for the application code.
I also test the code for real. I run it in dev and staging. Claude makes mistakes all the time, and I need to be sure the code actually does what I think it does. It’s not uncommon to discover that code looks correct but does something subtly different than intended or behaves contrary to expectations I didn’t realize I had until seeing the code in action.
I use Claude to review the code. I start a new session and ask it to review the code. Don’t tell Claude that I wrote the code with its help, because that can bias it. I just ask it to review the changes. It can look at the diffs in the commit, read more code, and spot issues the original Claude instance missed.
In practice I also use other automated code review tools that run as part of CI, and sometimes I skip local Claude review because of that. The point is that LLMs are great at noticing mechanical mistakes in code that humans often miss, and I rely on a fresh set of LLM eyes to catch those before they make it to production.
I use Claude to refactor the code it produces. The initial run is usually full of code smell or exposes latent smell, and this is where a huge chunk of my productivity boost comes from. Rather than shipping more technical debt on top of existing debt, I’m able to continually pay down debt with every PR because the cost of doing refactors with Claude is dramatically lower than doing them by hand.
For example, code organized into the wrong directory structure? That’s a pain to fix by hand. Claude can do it in a few minutes while I get a fresh cup of coffee.
Sometimes I need to talk with Claude about the refactor and plan it. The process looks like this:
hey, this part of the code looks weird to me. what’s up with it? don’t try to fix it, just explain to me why it’s like this
Claude answers
okay, what are some options for how we could improve it
Claude proposes some solutions
let’s go with option B, but change it to include feature 2 of option C or actually, i don’t like any of those, let’s do [something better] instead
Again, if I need more information to make a decision, I ask for it.
Frequently I have to explicitly ask Claude to look at docs or otherwise search the web for it to get correct information about how, for example, a library or API works. This is basically always worth the time it takes.
Finally, I never get mad at Claude. Sometimes it’s kinda dumb or includes a change that I don’t notice until a reviewer asks me about it and then I look dumb because I didn’t know it was there, but that’s my fault. As I said, I own the code, not Claude, and that means I’m both responsible and accountable for what Claude does. I find that this mindset is critical to using Claude Code effectively.
Can You Become a 5x Engineer?
Yes. Probably. Maybe. If you use Claude like I do, the quality of the results you get will depend heavily on how good of a programmer you are.
From what I can tell talking to other people, I work well with Claude because I have a lot of programming experience—over 30 years’ worth! I’ve developed strong intuitions about what good code looks like, and have clear ideas about what patterns will and won’t work. I can steer Claude towards good solutions because I know what good solutions look like. Without this skill, I’d be lost.
Now, I earned my intuitions the hard way, by spending years writing code by hand, carrying my punch cards uphill both ways. If you’re in that situation, great, you can pair with Claude Code the way I do to good effect. If not, is there anything you can do?
I think so. You can learn by talking to Claude and asking it to explain things to you. If you have a few basic motions to ask about tradeoffs and to ask for help understanding why Claude thinks some change is a good idea, you can use it to bootstrap yourself and gain a lot of experience rapidly. This might not be a total replacement for years of experience, but it’s better than nothing, and realistically, it’s what you’re going to have to do if you’re just starting out and want to get productive enough to get and stay employed.
Because if there’s one thing you should come away from this post with, it’s not that there’s a bunch of ways to use Claude or other coding agents to increase your productivity, it’s that you must start using these tools to become more productive so that you aren’t left behind.
Your process description sounds right (like, the thing I would aspire to, although I don’t consistently do it – in particular, I’ve identified it’d be good if I did more automated testing, but haven’t built that into my flow yet).
But, you don’t really spell out the “and, here’s why I’m pretty confident this is a 5x improvement.”
A few months ago I’d have been more open to just buying the “well, I seem to be shipping a lot of complex stuff”, but, after the METR “turns out a lot of devs in our study were wrong and were actually slowed down, not sped up”, it seems worth being more skeptical about it.
What are the observations that lead you to think you’re 5x? (also, 5x is a somewhat specific number, do you mean more like ‘observations suggest it’s specifically around 5x’ or more like ‘it seems like a significant speedup, but I can tell I’m still worse than the ’10x’ programmers around me, and, idk, eyeballing it as in the middle?)
(I don’t mean this to be like, super critical or judgmental, just want to get a sense of the state of your evidence)
My 5x number is because I can do roughly as much work as I used to do in a week in a day, if I hold the quality bar steady. In reality, I wouldn’t normally end up being able to do that, and would trade off quality for speed to impact and just live with tech debt to be paid down later. With Claude, I essentially never accumulate tech debt (other than the kind of debt that creeps up because I don’t notice it) and actively pay it down with each PR.
In terms of deliverables, though, it’s as I say more like a 1.5x improvement. The real productivity gains are coming from putting more effort into quality than I’d otherwise get permission to.
(Of course, I could just be wrong because I don’t have great ways to measure counterfactual code quality.)
Do you track your subjective experience of tech debt, please? If I stop by in 1 year’s time and ask for your measurements of tech debt accumulated since now till then compared to previous years, you will be able to tell me whether you still feel the improvement? Or you don’t have any data about previous years and have not started to measure any notes or other metrics about the improved tech debt feelings either? Or something else?
I’m not keeping a quantified record, if that’s what you mean, but yes I have a strong sense of how much debt there is in the code base and how it’s been trending.
Yeah it occurs to me reading this that, while I have used AI to code easy things faster, and sometimes code “hard things” at all (sometimes learning along the way), I haven’t used it to specifically try to “code kinda normally while reducing more tech debt along the way.” Will think on that.
This is basically the same development pattern I have, although I use Claude with the vscode Continue extension instead of standalone.
I’m unsure of how to quantify it, and the METR study from earlier this year makes me question myself. Yet, in domains I have strong familiarity with, my output seems higher quality and ships faster with CLI tools, especially Claude Code.
Pure vibe coding is still very shaky, meaning experiments where I only give feature level requests and don’t look at the code. I can’t get past low-moderate levels of complexity before things start to break. I assume this will be the case for at least the next year, maybe two, barring a huge leap.
This is roughly my expectation as well.
I like to use the PR agents in some cases. (But I still manually checkout on those branches and rebase, split the commits or rewrite some stuff)
spin off tasks when I’m on mobile
it is easier to do multiple parallel attempts on the same task when I know the output probably suck. And not gonna lie OpenAI’s codex cloud has very lenient compute limit so I also feel like I’m saving money this way.
they live in (other people’s) containers so I don’t need to worry about multiple agents colliding with each other. I know git worktrees exist but juggling the which worktree is on which branch turns out to be somewhat annoying too.
They are good for queueing up tasks that I don’t expect to have to bandwidth to start working on today. I can make the agents do the PR today and forget about them until a few days later.
Thanks! I’ve yet to see much value in this approach. I find the time to run the agent to generate the code is pretty short, and what it produces in these unmonitored runs takes more work for me to clean up than just iterating with Claude directly. But, I do expect that the tech will keep improving and that eventually this will be the superior workflow!
It may be the type of work that we are doing differs then.
Also, you don’t seem too bothered that running claude code implies a responsibility to review the code soon-ish (or have your local codebase go increasingly messy). The fact that I don’t need to worry about state with PR agents mean it is more affordable to spin more attempts, and because more attempts can be ran simultaneously, each individual attempt can be of lower quality, as long as the best attempt is good. Deciding that the code is garbage and not worth any time cleaning up is much faster than cleaning up, so in general I don’t find the initial read-through of the n attempts to take that much time. At the end I still only spin up codex on desktop if I think the task has reasonable chance to be done well, which really depends on the specific task size/difficulty/type (bug fix, refactor, adds). It’s also likely that claude code work better for you because you’re more experienced and can basically tell claude exactly what to do when it’s stuck.
I strongly suspect this is a lot of what makes my workflow work well for me. My problem is rarely figuring out what broadly needs to be done or how I want it done, and mostly just actually making the changes I want, which is far more tedious if I have to do all the typing.
Some details that might be relevant (in that I can imagine you’d get different results if the answers changed):
What language(s) is your codebase in?
If it’s an optionally-typed language, how much do you use types?
How big is it?
How familiar are you with it?
How big are the changes you typically have to make, across how many files?
How thorough are the existing tests?
How much of what you do is fixing bugs versus implementing features versus refactoring versus other?
Something like… how interesting is the code you write / how much boilerplate is there? Like I expect different amounts of help from the agent if a task looks like
Copy the 150 lines across five files that were used to implement this other feature, and change a few parameters.
Integrate with this new external API.
Improve performance of serializing and deserializing this data structure, when it’s streamed over a slow connection.
Change the underlying storage of this particular type from a search tree to a hashmap (most places we use the type won’t be affected, but a few places where we access internals need to be updated).
(Or if you think some of these aren’t relevant, I’m interested to hear that too.)
How likely do you think this quality aspect to stay there long-term? Are you able to allocate more time to quality due to having been sped up on the “core” part of development, but expectations haven’t been increased accordingly? When organizations realize they can push out more productivity, speed of development timeline might be forced to increase and quality may drop back to current levels.
Alternatively, do you think paying/preventing technical debt is quicker with LLM assistance than otherwise? I mean as relative cost compared to building out the specific features.
By the way, what IDE are you using with Claude?
Oh sure over time expectations will change and the free lunch will end. Right now I get the benefit of Product just being happy that Engineering finally does something that looks like hitting timelines at all. I’m sure the situation is already different at some other companies. I also have a lot of latitude to work on tech debt because of my high degree of seniority. In some sense my job is to pay down tech debt and otherwise improve the engineering organization to make the company machine better; the feature shipping is incidental.
I don’t use an IDE. My setup is
tmux
withclaude
running in one pane,nvim
in another, abash
shell in a third, and anotherbash
shell in a fourth where I run our local development environment (it prints logs to console I sometimes need to see).it helps with some tasks, not with others. it’s more fun, especially for moving-files-around sort of things. i am not seeing an overall improvement—the bottleneck was always code review, and context switching, and it’s not at all close to helping with those activities.
it uses much less mental effort. this is enticing, but the tradeoff is a lack of flow state. on the other hand, it’s way easier to get started.
i have spent hours prompting and re-prompting, making progress, feeling like i was flying, only to be interrupted by api limits, and then solve the problem in a normal unadorned text editor in minutes. well, claude laid the groundwork. you can’t argue with that.
i thought it was great for writing tests. then i tried it myself. i prefer my tests—they are more self-documenting, and i know they cover important cases.
it is noticeably improving.
I use Claude to generate code that I fix by hand. It is still less work than writing it myself. I am mostly using it on hobby projects (example) that probably otherwise wouldn’t get done.
Should it be a 5X junior-middle engineer, not all engineer work? I found Claude Code really struggles to implement something more complicated than a simple data mapping, APIs, or similar. Concrete examples I saw Claude Code didn’t work well and instead made me slower are arirthmetic coding, PPO. Also, see this research from METR that makes similar conclusion.
Claude Code doesn’t work that well if you’re not an experienced programmer. I mean, it works okay, but it has no taste, so it just produces syntactically correct stuff that is only randomly useful by default. It takes active steering to get good code out of it.
Also, I’m rarely trying to solve algorithmic problems, I’m trying to build production software, which is mostly about managing abstractions and plumbing different systems together to produce something useful to customers. All the hard algorithm work happens somewhere else by someone else because, while it’s essential, it adds very little marginal value for our customers. If algorithms to do something don’t exist, the solution is often to just wait for someone to figure it out, then build a feature on top of them. When I do get to work on algorithms, it’s usually solving complex concurrent execution problems, which Claude is okay at helping with but not great on its own. Luckily, I mostly design smartly to rely on systems that manage these details for me so I don’t need to work them out all the time.
Gordon said
so I’d amend that to “Gordon-level very senior engineer”, not junior-middle (not sure where you got that from the OP?)
I am a programmer, and I’m extremely skeptical of claims like this.
It seems to me your post outlined your position pretty well, but I feel I’m still lacking reasons to believe it myself if I didn’t already believe it. Is there anything you could say that would convince someone who didn’t already believe you?
Sure. I mostly wrote this post because I wanted to share my process, not because I wanted to defend the headline claim of a 5x productivity improvement, so I skipped over that. I could have written a version of this post where I didn’t make a specific claim about how much productivity gains I got; maybe I should have rather than stating my belief, since that’s the thing people want to object to the most!
There probably is some argument I could make and it would probably produce a more accurate number than my 5x estimate. I’m just not very interested in making it given that I and approxmiately everyone has high confidence that AI is making programmers more productive.