Epistemic status: Speculative but directionally serious. Drawing on real trends to name something emerging but undertheorized.
Karpathy coined “vibe coding” in February 2025 [1]. Describe what you want, accept the output, iterate by feel. Collins Dictionary made it Word of the Year [2]. Then the data came in.
METR’s RCT: developers using AI were 19% slower while believing they were 20% faster [3]. CodeRabbit: AI co-authored code had 2.74× more XSS vulnerabilities [4]. A multi-university paper argued vibe coding is killing open source [5]. Prompt-and-pray doesn’t work.
So AI can’t code? Wrong. AI can’t vibe. Take the human out, let the test suite decide, and you get something that actually works: software mining.
The Analogy
The trick Bitcoin discovered: you don’t need to understand the solution, you just need to verify it cheaply.
Software mining applies the same trick to code. Generate candidates, run the test suite, keep the survivors. The human writes the tests, not the code.
Claude Code authors ~4% of GitHub commits [6]. Autonomous multi-file changes, test runs, retries on failure, task horizons in days. OpenClaw adds orchestration: heartbeat scheduling, overnight crons, self-installing skills [7]. Users wake up to finished code.
They are not chatbots.
They are mining rigs.
Proof of Concept: AlphaEvolve
DeepMind’s AlphaEvolve [8] mined a matrix multiplication algorithm that beat a 56-year record, then mined an optimization that sped up its own training. It’s still producing: this month it improved bounds on five classical Ramsey numbers [9]. Imbue’s Darwinian Evolver [10] and a wave of open-source clones confirm the paradigm generalizes. Not vibed. Mined.
Hash Rate Economics
METR showed human-in-the-loop is slower than no AI at all [3]. Remove the human, automate evaluation, and throughput is bounded only by inference cost × test execution time.
Coding benchmarks roughly doubled in 18 months: SWE-bench Verified 49% → Aider code editing 88% [11][12]. Models cheaper monthly. Generating a thousand candidates already beats hand-crafting one for algorithm optimization. That threshold moves down every quarter.
What Gets Mined
The bottleneck is the evaluation function. Bitcoin’s is trivial. Hash the block, check if the number is small enough:
Software mining needs something richer. For each candidate program, score it automatically. Does it pass the tests? Is it fast? Is it safe? Keep it only if it clears the bar:
But there’s a deeper frontier. Algorithm optimization, bug fixing, performance tuning, anywhere the test suite is the judge, software mining already works. But you cannot yet mine product-market fit, a novel research direction, or the judgment call that a problem is worth solving at all.
This is where humans remain in the loop, for now at a higher level. Not writing code, not judging code, but choosing the search space. Writing the evaluation function is the new engineering. Choosing what to evaluate is the new entrepreneurship. The first person who figures out how to mine that layer too changes everything.
The inversion: vibe coding bet on human taste + AI generation. Software mining bets on automated evaluation + AI generation.
Human moves from aesthetic judge in the inner loop to evaluation engineer in the outer loop.
Difficulty Adjustment
Bitcoin adjusts difficulty as hash power grows. Same here. AI-generated code floods the ecosystem (iOS releases up ~60% YoY [13]). Easy niches fill. CRUD apps get mined out. What remains demands better evaluation functions, more compute. Red Queen’s Race, LLM edition.
The strategic response: treat inference budget the way miners treat hash power. Allocate it deliberately across search problems. Mining pools are already forming. OpenClaw’s ClawHub is early communal infrastructure for pointing agents at problems [7]. Expect more.
What’s Your Hash Rate?
Vibe coding was transitional, the brief moment humans stayed in the loop, adding “taste” that subtracted speed. Software mining now follows. LLM generates, test suite evaluates, selection does the rest.
Vibe Coding Is Dead: Welcome to Software Mining
Epistemic status: Speculative but directionally serious. Drawing on real trends to name something emerging but undertheorized.
Karpathy coined “vibe coding” in February 2025 [1]. Describe what you want, accept the output, iterate by feel. Collins Dictionary made it Word of the Year [2]. Then the data came in.
METR’s RCT: developers using AI were 19% slower while believing they were 20% faster [3]. CodeRabbit: AI co-authored code had 2.74× more XSS vulnerabilities [4]. A multi-university paper argued vibe coding is killing open source [5]. Prompt-and-pray doesn’t work.
So AI can’t code? Wrong. AI can’t vibe. Take the human out, let the test suite decide, and you get something that actually works: software mining.
The Analogy
The trick Bitcoin discovered: you don’t need to understand the solution, you just need to verify it cheaply.
Software mining applies the same trick to code. Generate candidates, run the test suite, keep the survivors. The human writes the tests, not the code.
Claude Code authors ~4% of GitHub commits [6]. Autonomous multi-file changes, test runs, retries on failure, task horizons in days. OpenClaw adds orchestration: heartbeat scheduling, overnight crons, self-installing skills [7]. Users wake up to finished code.
They are not chatbots.
They are mining rigs.
Proof of Concept: AlphaEvolve
DeepMind’s AlphaEvolve [8] mined a matrix multiplication algorithm that beat a 56-year record, then mined an optimization that sped up its own training. It’s still producing: this month it improved bounds on five classical Ramsey numbers [9]. Imbue’s Darwinian Evolver [10] and a wave of open-source clones confirm the paradigm generalizes. Not vibed. Mined.
Hash Rate Economics
METR showed human-in-the-loop is slower than no AI at all [3]. Remove the human, automate evaluation, and throughput is bounded only by inference cost × test execution time.
Coding benchmarks roughly doubled in 18 months: SWE-bench Verified 49% → Aider code editing 88% [11][12]. Models cheaper monthly. Generating a thousand candidates already beats hand-crafting one for algorithm optimization. That threshold moves down every quarter.
What Gets Mined
The bottleneck is the evaluation function. Bitcoin’s is trivial. Hash the block, check if the number is small enough:
Software mining needs something richer. For each candidate program, score it automatically. Does it pass the tests? Is it fast? Is it safe? Keep it only if it clears the bar:
But there’s a deeper frontier. Algorithm optimization, bug fixing, performance tuning, anywhere the test suite is the judge, software mining already works. But you cannot yet mine product-market fit, a novel research direction, or the judgment call that a problem is worth solving at all.
This is where humans remain in the loop, for now at a higher level. Not writing code, not judging code, but choosing the search space. Writing the evaluation function is the new engineering. Choosing what to evaluate is the new entrepreneurship. The first person who figures out how to mine that layer too changes everything.
The inversion: vibe coding bet on human taste + AI generation. Software mining bets on automated evaluation + AI generation.
Human moves from aesthetic judge in the inner loop to evaluation engineer in the outer loop.
Difficulty Adjustment
Bitcoin adjusts difficulty as hash power grows. Same here. AI-generated code floods the ecosystem (iOS releases up ~60% YoY [13]). Easy niches fill. CRUD apps get mined out. What remains demands better evaluation functions, more compute. Red Queen’s Race, LLM edition.
The strategic response: treat inference budget the way miners treat hash power. Allocate it deliberately across search problems. Mining pools are already forming. OpenClaw’s ClawHub is early communal infrastructure for pointing agents at problems [7]. Expect more.
What’s Your Hash Rate?
Vibe coding was transitional, the brief moment humans stayed in the loop, adding “taste” that subtracted speed. Software mining now follows. LLM generates, test suite evaluates, selection does the rest.
Write the test suite. Crank the hash rate.
References:
Karpathy, A. (2025). Original “vibe coding” post. X, February 2, 2025.
Collins Dictionary (2025). Word of the Year 2025: Vibe Coding. November 2025.
Becker, J., Rush, N., Barnes, E., & Rein, D. (2025). Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. METR.
CodeRabbit (2025). State of AI vs Human Code Generation Report. December 2025.
Koren, M., Békés, G., Hinz, J., & Lohmann, A. (2026). Vibe Coding Kills Open Source. arXiv, January 2026.
SemiAnalysis (2026). Claude Code is the Inflection Point. February 2026. (~4% of GitHub commits.)
Steinberger, P. (2025–2026). OpenClaw. GitHub.
Novikov, A. et al. (2025). AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms. Google DeepMind.
Nagda, A. et al. (2026). Reinforced Generation of Combinatorial Structures: Ramsey Numbers. March 2026.
Imbue (2026). LLM-based Evolution as a Universal Optimizer. February 2026.
Anthropic (2024). Claude’s SWE-bench Verified Performance. October 2024. (49% baseline.)
Aider LLM Leaderboards (2025). Code editing benchmark scores. GPT-5 at 88%, October 2025.
Gamigion (2026). iOS App releases jumped 60% in 2025 after three years of flat growth. February 2026.