Mo Putera comments on Mo Putera’s Shortform

Mo Putera 2 May 2026 18:13 UTC
12 points
2
Various token consumption counts & costs for my own future reference.
- 65k tokens (50k input / 15k output) costing $0.705 for a 1-hour coding session with Opus 4.6 in Claude Managed Agents
- 308,880 tokens and 15 RMB (~$2.20) generates a 15-second video in Seedance 2.0, so 1.2M tok/min
- ?? tokens and $6 cost + 3.5 hours of amortised undergrad time: average cost per manuscript for a replication of Sakana’s AI scientist, reported in early 2025. “The total cost for our main experiments, which included generating 10 ideas, running experiments for 7 ideas (from 12 total, including 2 seed ideas, of which 5 failed), and producing 7 manuscripts with reviews, amounted to only $42 USD.”. This was lower than Sakana’s own $15 cost per idea claim because their experiments were less complex. They estimated 6-11x speedup: “we estimate that an undergraduate student would require at least 20 to 40 hours”
- ?? tokens and $13 per developer per active day average cost for Claude Code in Apr 2026 across enterprise deployments, or $150-250 per dev-month, with 90% of users below $30 per day
- ?? tokens and $2.40-28.50 per API call for OpenAI’s Deep Research API as tested by Artificial Analysis
- 100 to 270k tokens (according to GPT-5.5 extended thinking’s BOTEC) and $15–35 API-equivalent token spend for GPT-5.4 Pro to solve Erdos problem #1196 and convert the solution to a LaTeX math paper, at $30 / $180 per M token rates. The 80-minute 55-page reasoning chain was the big cost driver; this ratio is inverted for agentic workflows where cache reads etc dominate
- 0.8 to 8M tokens per “human-equivalent digital worker-days” in Epoch’s speculative BOTEC concluding OpenAI in Oct ’25 could deploy a median of 7M digital workers (50% CI 1.4-28M, mean 62M) across their 480k H100-equivalents. This was based on GPT-5 typically using 0.1 to 1M tokens to do METR tasks that take an hour, like debugging a program or training a classifier
- 2 to 60M tokens for analyst-type tasks e.g. “convert excel models into dashboards, create charts for all our notes, build financial models and analyze company earnings, and much more”. I shared yesterday SemiAnalysis’ table below of their token spend vs labor costs on their workflows; given their true blended price of per M tokens of $0.99 for running Opus 4.7 on agentic tasks, I assume for simplicity the token cost converts straightforwardly to M tokens
- 700 million tokens weekly: Ege Erdil’s ballpark usage by a single continuously-running full-time agent. Ege himself said he clocks 1-10B tokens weekly
- “Hundreds of millions of tokens” enabled Love Q&A Technology (Beijing) founder Li Jiayi and his team to compress the development cycle from “at least 6 months and a team effort” to 2 months for their interactive AI-powered toy’s supporting software
- 1 billion tokens consumed, $1,500 Anthropic bill (mix of Max 20x, overages, API direct) and $1,800 all-in including hosting etc enabled Jason Hoffman to ship “a business and 3 side projects” (judoka.ai, judoka.blog marketing site localised in 8 languages, payment infra, App Store approval) in 27 days. Jason, ex-CTO at Joyent, estimated this would all have taken 10-20k hours (“industry benchmark 5-6 years solo, 12-15 months with 5 junior engineers”) and cost $0.75-1.5M, so ~600x cost reduction
- 1 billion tokens in 7 days for Wave, “an AI simulation of 1000 VC investors”, to simulate 300 pitches
- >1 billion tokens by an anon redditor in about 5 days, “working ~3hours per day of my time on ~4 HOBBY projects: (1) Unreal game development wasted the most tokens, I scrub and move to Unity with better results (2) Automation and backtesting of some trading strategies (3) Personal and (4) Webmaster AI agents”
- 1.7 billion tokens by another redditor “in the last 48 hours” as a one person company / solo dev
- ?? tokens and $3,476 in API cost for GPT-5 to find two novel zero-day vulnerabilities and produce exploits worth $3,694 “when evaluated in simulation against 2,849 recently deployed smart contracts without any known vulnerabilities”, for an average net profit of $109 per exploit. The median number of tokens required to produce a successful exploit declined by 70% from Opus 4′s 228k tokens to 4.5′s 78k
- ?? tokens and $2,200-10,000 cost to run ARC-AGI-3 leading to scores between 0.1-0.5%. They don’t say how much it would cost the human panel to do it
- 2 billion tokens consumed, $20,000 in API costs, and 140 million tokens outputted by a team of 16 Opus 4.6-based agents running for 2 weeks enabled Nicholas Carlini to write a Rust-based C compiler from scratch that can build Linux 6.9 on x86, ARM, and RISC-V, albeit with plenty of limitations w.r.t. production readiness. This “nearly reached the limits of Opus’s abilities. I tried (hard!) to fix several of the above limitations but wasn’t fully successful. New features and bugfixes frequently broke existing functionality”
- ?? tokens and also $20,000 cost for Mythos Preview let Anthropic do a thousand runs on OpenBSD leading to several dozen zero-day exploits, including the vulnerability in the “implementation of SACK that would allow an adversary to crash any OpenBSD host that responds over TCP”
- 5 billion tokens monthly per employee across 85 people is SemiAnalysis’ current consumption rate, corresponding to annualised token spend of ~30% of employee compensation. (“This is power law distributed though, so there are team members running over 100B tokens a month.”) Daily token spend below, colors likely correspond to individual users:
- 6.1 billion tokens in 30 days circa July(?) 2025 by Liu Xiaopai, who the Chinese internet at one point called “the world’s most prolific Claude user”, burning ~$50,000 on a $200/month plan. (The #2 user consumed ~2x the tokens (11.5B) at ~$20,000.) ~10B tokens per month was my impression of the high watermark in mid-2025. This yielded him a $1M revenue runrate in “pure margin”, 100x Beijing’s average disposable income, thru “vibe coding and hands-on management of more than a dozen successful AI products sold overseas”. More below on what Liu gets out of this
- 7+ billion tokens per day processed by an internal Cloudflare agent that does security reviews of their codebases, running on Kimi K2.5 at $550k cost per year runrate, which “has caught more than 15 confirmed issues in a single codebase”
- 10 billion tokens in 8 months on Claude’s Max plan ($920 total, >$15k API equivalent) let Kyle Redelinghuys work on “dozens of projects… from building an SMTP relay for email tracking to working on synthetic genomic data for prenatal testing research, analytics dashboards, profiling tools and various smaller utilities”, using Opus 95% of the time. In his busiest month he ran 201 sessions across 45+ projects totaling $5,263 API equiv cost all covered by the Max 20x
- “Billions of tokens in Claude Code used for many PRs” by Axiom Bio: “Claude agents with MCP servers are core to our scientific work, directly querying databases to interpret, transform, and test data correlations, helping us identify the most useful features for predicting clinical drug toxicity”
- ?? tokens and $70,000 in Claude usage by one startup founder who “discovered a loophole in an A.I. tool made by Figma that allowed him to use this much thru a $20/month account”, which let him “build six software projects at the same time”
- 50 billion tokens a month in Jan 2026 by Strange Loop Canon’s Rohit Krishnan, who “runs 20 agents simultaneously” (more below)
- 100+ billion tokens a month by SemiAnalysis’ top users (I guess Jeremy and Malcolm, see below)
- 200+ billion tokens and ~$100,000 per month is roughly where the top 3 or so developers on the Tokscale leaderboard self-report
- 210 billion tokens in a week by a single OpenAI engineer, and >$150,000 in a month by a single Claude Code user at Anthropic from the same Kevin Roose NYT article
- 292 billion tokens org-wide in the last 30 days ending mid-April 2026 either routed through Cloudflare’s AI Gateway or processed on their Workers AI internal agents across 3,683 internal users comprising 60% of all staff, leading to a +58% increase in weekly merge requests shipped org-wide
- 1 trillion tokens processed in the last 12 months is a bar 330 Google Cloud customers cleared as of late April 2026, of which 35 customers exceeded 10 trillion
- 44 trillion tokens in the last 30 days internally at Meta as of late March 2026
- 700+ trillion tokens monthly runrate processed by Google’s first-party models via direct API use by customers in Q1 2026 (“16B+ per minute” is the claim), up +60% vs last quarter
- 1,300+ trillion monthly tokens “processed across [Google] surfaces” in Oct 2025
- 4,200+ trillion monthly tokens China-wide in early 2026, >1,000x that of early 2024, in a passage about OpenClaw’s overwhelming popularity there. Apparently >85% of this is ByteDance’s Doubao model alone, which supports AI video generation, with usage doubling in 3 months
- 21,100 trillion tokens processed in 2025 is China’s official reporting
Token counts and costs not easily estimable:
- Fiction: Michael Barr, CTO of Barr Group and a world expert on embedded systems, collaborated with Claude Code’s AI agent teams feature to finish and publish his novel Metacompiler in under 24 hours. Its core idea is based on Ken Thompson’s Turing Award lecture “Reflections on Trusting Trust”. Opus 4.7′s totally unsubstantiated guess is that this cost $200-1,000 in API cost. Barr:
I have spent thirty years working with embedded software: the firmware in your car’s throttle controller, your pacemaker, your insulin pump. In 2013, I served as the lead software expert witness in a Toyota unintended acceleration trial — the only such case to reach a jury. My team spent eighteen months examining millions of lines of engine control code. The jury found Toyota liable.
That experience planted a question I could not shake: What if vulnerabilities didn’t exist in individual product designs, but instead in a compiler that built every product’s code? If you asked me to defend any technical claim in the novel under cross-examination, I could do it. I’ve done it before.
For years I accumulated notes. Character sketches and draft chapters. Research files on compiler design, signals intelligence, medical device vulnerabilities. I fell in love with the story arc as well as the main character. I possessed and provided the technical expertise to invent such a plot. What I did not have was the ability to sit down and write a 100,000-word novel.
- Math: “an internal version of GPT‑5.5 with a custom harness helped discover a new proof⁠ of a longstanding asymptotic fact about off-diagonal Ramsey numbers, later verified in Lean”
- Bio: “Derya Unutmaz, an immunology professor and researcher at the Jackson Laboratory for Genomic Medicine, used GPT‑5.5 Pro to analyze a gene-expression dataset with 62 samples and nearly 28,000 genes, producing a detailed research report that not only summarized the findings but also surfaced key questions and insights—work he said would have taken his team months”
- SWE at Anthropic: Boris Cherny running 5+ agents shipped 300+ PRs in Dec 2025
- SWE at OpenAI: Ryan Lopopolo in Feb 2026 estimated a 10x speedup in which a team of 3 engineers drove Codex agents over 5 months to “build and ship an internal beta of a software product with 0 lines of manually-written code. The product has hundreds of internal daily users including power users and external alpha testers. It ships, deploys, breaks, and gets fixed”. 1M LoC, 1,500 PRs (3.5 PRs per engineer-day, i.e. a quarter-Cherny)
- OpenAI Comms team “used GPT‑5.5 in Codex to analyze six months of speaking request data, build a scoring and risk framework, and validate an automated Slack agent so low-risk requests could be handled automatically while higher-risk requests still route to human review”
- OpenAI Finance team “used Codex to review 24,771 K-1 tax forms totaling 71,637 pages, using a workflow that excluded personal information and helped the team accelerate the task by two weeks compared to the prior year” (same source)
- SemiAnalysis’ Jeremy used Claude Code over 3 weeks, spending up to $6,000 a day, to build an internal dashboard that let their analysts “check micro-regions with power deficits and surpluses”, scraping data from “every single power plant in the US, every single transmission line above a certain voltage, creating a map of the entire US grid as well as demand sources all from various public data sources”. Supposedly they showed this to some of SA’s customers who are energy traders and they said “wow, this is better than XYZ company who have 100 people and have been working on this for a decade”
- SemiAnalysis’ Malcolm, previously an economist at a major bank whose economist department was 100-200 people, did “a phantom GDP analysis and 2,000 evals that would’ve taken these 100-200 economists a whole year”: “he piped all this data eg FRED, employment reports, etc from various APIs, ran regressions, looked at the impact of various economic revolutions on the economy from a deflationary and inflationary POV—the BLS has an entire set of 2,000 tasks which he looked at and tagged which ones can be done by AI and which can’t, grading them across a rubric, created a metric of things that can be done with AI, hence modelling deflationary aspect of it”. I’m guessing this was inspired by the Citrini essay’s mention of “ghost GDP”
Nikunj Kothari describes Moloch, basically:
Waking up and checking what your agents produced overnight is the first thing now. Before coffee. Before texts. You open your laptop and grade homework you assigned in your sleep. Some of it is good. Most needs rework. But you start shipping a plan before you sleep just so you can wake up to more code written overnight. Saturdays became uninterrupted build windows. No meetings, no Slack, twelve hours of you and your agents. Sunday morning X is all terminal screenshots and shipping receipts. “What’d you ship this weekend?” replaced “what’d you do this weekend?”
The anxiety is rational, which is why it sticks. Every week some new benchmark drops that makes last month’s workflow feel prehistoric. Codex ships overnight processing. Opus gets faster. Context windows double. None of it reduces the pressure. It multiplies it. You can do more now. And someone already is. The window to be first at anything feels like it’s shrinking by the day. Literally, by the day.
I replaced Netflix with Claude Code. I lie in bed thinking about what I can spin up before I fall asleep, what can run while I’m unconscious. Reading a novel feels indulgent now. Watching a movie without a laptop open feels wasteful. This voice in my head that says “something could be running right now” just doesn’t shut off. I’m not even building a company. I’m just addicted to building my random ideas.
Everyone here knows they should step away more. That’s not the problem. The problem is what your brain does when you try. I still take aimless walks. The agents come with me now.
Kevin Roose is skeptical:
I talked to several other tokenmaxxers about what they’re doing with all those tokens. Most were engineers or hobby programmers who were building and maintaining large, complex pieces of software using coding agents running in parallel. They said, by and large, that A.I. coding tools were making them more productive. But some also framed their use of A.I. as a strategic move — a way to signal, to their colleagues and bosses, that they’re keeping up with the times, as the era of human coding appears to be coming to an end.
Are any of these tokenmaxxers producing anything good? Or are they merely spinning their wheels, churning out useless code (and wasting valuable processing power) in an attempt to look busy? Time will tell. Maybe the A.I. addicts of today will be the 100x engineers of tomorrow. Or perhaps it’s just productivity theater — a glimmering tower of tokens, constructed by the competitive and fearful, that will topple as soon as we understand what really makes for useful work. Either way, we’re going to need a lot more data centers.
Nikunj again, seeing a different reality than Kevin:
A founder I angel invested in has not updated his board in a few months. His product works, revenue is growing, but a new Claude feature does 80% of what he spent two years building. He does not know what to tell them. So, he has resorted to saying nothing.
YC founders refresh Anthropic’s changelog the way traders watch earnings calls. Every release is a potential kill shot. Anthropic added $11B in annualized revenue in the last month and OpenAI just closed $122B at an $852B valuation. These companies ship features monthly that erase startups overnight. …
Layoffs arrive in quiet monthly waves now. No headlines, just Slack messages. “Restructuring.” Everyone knows what it means. The agents are now doing what teams of people used to.
Liu Xiaopai on what billions of tokens a month buys him:
Afra: I’m curious, do you believe token consumption correlates with output? Why use dollars spent as your metric?
Liu Xiaopai: There’s definitely a positive correlation. … though not linearly. If you consume 10x the tokens of someone else, your output might only be 2x theirs, not 10x. Many people think you should conserve token resources. But what’s the cost of conserving computational resources? It means you, as a human, consume more time. Under Claude Code’s initial policies, I didn’t care about conserving computational resources—I cared about conserving my time and energy, freeing myself for more creative work. At least in early July, Claude Code’s token limits were essentially unlimited, so my entire usage philosophy aligned with what it permitted. I didn’t care how much electricity or how many tokens it consumed. I only cared about saving my time, energy, and attention.
Rohit Krishnan on what 50 billion tokens a month buys him:
Rohit: I’m not doing dramatically different things but the friction is gone. Two years ago, I would be looking at a query, counting the tokens, thinking, should I send this? Ten thousand tokens felt significant. Now I just ask. The funny thing is that most of the growth isn’t coming from the queries I planned to run. It’s coming from the ones I wouldn’t have bothered with before, because the cost, time and effort were too high. I built a monitoring tool to track my usage. …
Rohit: I have three screens. On one, Codex is generating a small application that lets me play music on my computer keyboard. On another, my prediction agent is running, comparing my Polymarket forecasts to daily news. In Telegram, I have two conversations open: one with Morpheus, my OpenClaw agent, and one that handles day-to-day admin. And I have a long-running project called Horace working quietly in the background, which is my attempt to get AI to write better. This is my normal. But none of this was normal 18 months ago. The thing that actually changed my behavior most wasn’t the power; it was the interface. I’ve tried to-do list apps for 20 years. I have never stuck with one for more than four days. They all require me to change my behavior. Morpheus doesn’t. I’m walking somewhere, I think of something, I fire it into Telegram. It reads my email history, compares it to what I’ve said I want to do, and tells me what I should be working on.
- Mo Putera 2 May 2026 18:41 UTC
  3 points
  0
  Parent
  GPT-5.5 comments on the above, in my own words:
  - good raw ethnography (a “natural history of token abundance”), unclean economic dataset mixing together at least 6 quantities all called “tokens”
  - since token volume is only weakly coupled to value (Dylan Patel’s claims notwithstanding), you should instead ask “which workflows convert token abundance into durable, validated output?”
  - Nicholas Carlini’s Rust compiler example is good, Cloudflare too. SemiAnalysis examples all “very rhetorically potent” and misleading for cases that incur large human costs in expert validation and downstream correction
  - examples cluster into a few categories:
  - Personal friction removal: Rohit, Liu, Kyle, hobbyists
    Artifact production: compiler, novel, dashboards, startup products
    Scientific/research automation: AI Scientist, gene expression, Ramsey proof
    Security/eval search: smart-contract exploits, OpenBSD/Mythos, ARC-AGI
    Firm-scale workflow rewiring: SemiAnalysis, Cloudflare, OpenAI Finance/Comms, Axiom
    Platform/macro throughput: Google, Meta, China-wide token numbers
  (slop warning)
  GPT-5.5, which is not at all AGI-pilled, guesses annual tokens processed through 2030: