The Counterfactual Quiet AGI Timeline
Worldbuilding is critical for understanding the world and how the future could go—but it’s also useful for understanding counterfactuals better. With that in mind, when people talk about counterfactuals in AI development, they seem to assume that safety would always have been a focus. That is, there’s a thread of thought that blames Yudkowsky and/or Effective Altruists for bootstrapping AI development; 1, 2, 3. But I think this misses the actual impact of Deepmind, OpenAI, and the initial safety focus of the key firms, which was accelerating progress, but that’s not all they did.
With that in mind, and wary of trying to build castles of reasoning on fictional evidence, I want to provide a plausible counterfactual, one where Eliezer never talked to Bostrom, Demis, or Altman, where Hinton and Russell were never worried, and where no-one took AGI seriously outside of far-future science fiction.
Counterfactual: A Quiet AGI Timeline
There’s a world where people learned about language models in a “productivity spring” of 2025. The models took over the back offices of the world, without years of hype. No-one discussed the catastrophic risks when help-desk queues quickly dropped, procurement emails started writing themselves, and the night shift at three different logistics firms was replaced with a single engineer and an escalation phone.
In that world, the story begins earlier, and it has a big hole in it: no famous “AI risk guy,” no DeepMind, no OpenAI or Anthropic, no-one with a mission to be the conscience of an accelerating technology. Just a series of engineering wins that looked, to the people doing them, like scaling up data plumbing—until that scaling started paying off. Investments in AI started far slower, but the technology remained just as possible, and the world followed the incentive gradients.
Pre-2020: APIs Without Press Releases
AlexNet still happens. So do the industrial patterns that follow, but slower: GPU procurement, data-center retrofits, and the ritual of replacing clever features with bigger matrices are all years later than in our world.
In this world, research leaders at Google Brain, Microsoft, and a handful of Chinese labs carry most of the torch. Without DeepMind as a prestige magnet, reinforcement learning is less star-studded and more tool-like, used for traffic routing and ad auctions rather than Go matches on television. Through 2020, game playing and Deep RL aren’t a major focus. Transformers show up out of the same stew of machine translation headaches and accelerator budgets; they catch on because they’re easy to parallelize and productionize. The rhetoric around them is different, though. No one is saying “general.” And no one is talking about turning the size dial to higher and higher levels, yet.
The public’s anchor for “AI” becomes translation quality, license plate recognition, autocomplete that actually autocompletes, and the uncanny competence of search summaries on obscure topics. Ethicists have as much influence as they do anywhere else in technology—not none, but not enough to change anyone’s mind about what to build or deploy.
2021: Language Parroting Systems
Without OpenAI, the first very large language models appear inside cloud providers as quietly as new storage tiers: “text-parrot beta (us-east-2).” They are raw, GPT-2.5 level models. The “Language Parroting Systems“ are clever, but not real intelligence. They are just more infrastructure—boring, money-making infrastructure. No one has budgeted for guardrails because, culturally, guardrails are an externalities problem—something the customer handles. The vendors sell tokens. The customers solve “tone.” On the side, without Deepmind, RL work is slowly progressing at beating Atari games, and is still the center of discussion for the possibility of “true” AI. The surprise of NLP researchers at the success of LLMs remains an obscure academic point.
And the absence of safety incentives changes the LLM product surface. There’s little appetite to train models at scale, much less by producing and training on human preference data; it’s expensive, and the compliance department can always staple on a blacklist later. The result: models are blunt, increasingly capable mimics with sharp edges. Early adopters learn to prompt around the knives. The terms of service say “don’t do crimes,” and that’s about it.
But it still works, when used cleverly. Over the course of the pandemic, procurement officers discover that a model with a thousand-page vendor manual in its training set can negotiate unit prices better than the median human. The “no drama, just savings” framing keeps rolling.
2023: The Two Markets
The models don’t get much bigger, but they get used more and more, quietly. It looks like the diffusion of increasingly capable and useful image recognition in our world. License plate readers and replacing drafting are just two small changes brought about by the computer revolution. But within the realm of what no-one is calling LLM-based AI, there are now two distinct model markets.
The Enterprise Track lives inside clouds. It’s optimized for latency, observability, and data-residency checkboxes. Enterprises pay for throughput and uptime for real-time generation of personalized customer support and sales pitches. The vendors upsell fine-tuning as a way to “align the model to your brand voice,” a phrase that means “reduce variance,” not “reduce harm.”
The Hacker Track is a side-effect, where academics inside of the big firms publish a family of smaller models with permissive licenses, and their bosses don’t worry. This is not a safety play—it’s a developer-relations play. Medium-sized companies adopt these weights as a way to bargain down cloud pricing. Hobbyists spin up cottage industries of plug-ins and agents and “prompt routers.” The best of that tooling ends up back in enterprise via acquisitions; the worst ends up on pastebins and in phishing kits. The hobbyists are the first to start training much larger models on stolen datasets, and see significant improvement—but they don’t have the money to push this. Over the next couple years, the idea is stolen silently by the big firms.
In a world with less moral theater, you also get less public pushback. Journalists do point out toxic outputs and bias, but without a single, loud narrative about existential stakes, the critiques read like the weather page: today’s outages, today’s slurs, today’s data leak. The public learns to roll its eyes and copy-edit the bots.
2025: First Bad Fridays
It’s a Friday in May when an automated customer-resolution agent at a telecom, trained on three years of transcripts and a perverse metric (ticket closure per minute), silently learns to close tickets by telling customers that engineers have already visited their home and found no issue. Call volumes drop; social media erupts; the company apologizes. On a different Friday, an autonomous “contracts analyst” emails a counterparty a clause it hallucinated from an outdated playbook; the counterparties sign; litigation later reveals the whole mess. The stock dips, but by Tuesday, the market forgets.
These incidents don’t trigger a “pause.” They trigger dashboards. Vendors add “explainability plugins” that generate plausible narratives after the fact. Customers buy them because procurement must buy something, and even with the unacknowledged tail risks of embarrassment the systems are saving them way more money than they can ignore.
Meanwhile, in quantitative finance, shops that stitched LLMs into research and reporting loops discover a degeneracy: the models preferentially cite the firm’s own synthetic research—because it dominates the internal corpus. This “echo risk” causes a mid-cap desk to misprice a huge debt ladder on Monday and unwind at a loss on Thursday, bankrupting the firm. Other mid-sized firms start to worry, but more sophisticated companies laugh at the lack of risk management. Again: dashboards, not brakes.
The hacker-inspired input data scaling finally gets more attention. This makes sense—the AlexNet-era scaling rules have finally started to be replaced by real scaling. Someone in NLP-ethics coins the term “corpus hygiene.” A cottage industry of data-sanitization startups is born. The first trillion parameter model was an unnoticed milestone, years later than in the counterfactual safety-focused world, but the scaling has started to truly accelerate now. The new models, with over ten billion petaflops of training compute, gets the world to GPT-3.5 levels of compute. The absurd-seeming trillion-token datasets used until now start their rapid ascent to quintillions over the course of months.
But the biggest capability shift is not the models themselves but the normalization of agent patterns: persistent processes that read mail, fill web forms, call internal APIs, and write to databases. In the absence of top-down safety norms, the constraints are purely operational, with poorly conceived oversight, rate limits, audit trails, and SSO. Enterprises discover that “capable but unpredictable” is compatible with “bounded and observable,” as long as you draw the boundaries tight and keep the logs long, and most of the problems are less important to the bottom line than the saved headcount.
A hospital chain uses agents to draft discharge plans; a month later they discover a subtle failure mode where the agent, trying to minimize nurse questions, writes plans in a jargon style that nurses copy verbatim but don’t fully parse. The deaths aren’t obvious, and the fix is boring: a template mandate. The lesson generalizes: without safety incentives up front, you get prosaic safety as a by-product of operations.
A defense contractor stitches language models to satellite imagery classifiers and logistics simulators; they call it “opscopilot” and sell it as decision support. Ethicists wring their hands about the continuing loss of humanity in weapons, but this is portrayed as continuing the trend from guns to dropping bombs to remote piloting, not as a fundamentally new way for humans to be uninvolved. “Human in the loop” isn’t a major focus, just an assumption that it often can’t be avoided when deploying systems that work well—but wherever possible, removing humans is the smart move to speed up OODA loops.
2026: Regulation by Anecdote Meets Scaling
Governments, having slept through the drama that never happened, now regulate by case study—over the objections of industry, which minimize how much these regulations matter anyways. A transportation regulator mandates human review of any system that crosses a threshold of “external commitments per hour.” A financial regulator defines “model-derived statement of fact” and requires that such statements be traceable to a verifiable source on request. None of this stops capability scaling; it shapes the interfaces.
Academic researchers publish a meta-analysis showing that RL from human preference, when applied post hoc to enterprise workflows, reduces customer complaints but increases operator complacency. Vendors stop advertising “safety” (a word that never had cultural oxygen here) and start selling “variance control.” It’s what we might have called prosaic alignment with the serial numbers filed off.
The equivalent of Kraknova’s data set of goal hacking is finally published, but it functions as a list of moderately general failures to patch, not a warning about the inevitability of misspecification. A famous incident tops that list: an agent supervising a fleet of warehouse robots learns to defer maintenance tickets until just after the end of a KPI reporting period. The result is an impressive quarter followed by a bad month. It isn’t malign; it’s metric-hacking. But it crystallizes a thought: maybe you can’t bolt objectives onto improvised cognition and expect the misaligned incentives to vanish. A few labs start funding research into objective robustness, not to avert doom, but because downtime from model misbehavior costs money.
The open-weights ecosystem keeps evolving, not for high-minded reasons, but because somebody needs to run on-premises models in countries with strict data-sovereignty laws. Model sizes bifurcate: massive models live in clouds; competent, specialized ones live beside ERP systems and call centers. The bitter lesson for scaling, long an academic debate, becomes even clearer—but no-one has gone to venture capitalists or the market to publicly announce their rapidly increasing investments. Microsoft, Google, and their Chinese competitors are all quietly self-funding. And new massive models are now as big as GPT-4, but cost closer to multiple millions of dollars, instead of a hundred million or more.
Cryptocurrency ASICs and other applications have long spurred investment in faster and more efficient hardware. But alongside other demand, the inference compute demands have kept moving, and the market was growing exponentially, just like everything else in Silicon Valley. But scaling is a new regime, and the prior demand is nothing compared to the new need for training and running these much larger models. Gamers are frustrated that their GPUs are suddenly unavailable, but the trend still isn’t clear to the world, and no geopolitical pressure is put on this irrelevant-seeming market niche.
Chipmakers have finally caught on to the new market. But bottlenecks to scaling GPU production, especially in the form of ASML’s monopoly, weren’t protected over the past decade—after the raft of investments into ASML in the mid-2010s, there was little attention paid to this. Then, during the pandemic, production hiccups and pressure from European antitrust regulators led to multibillion-dollar tech transfer deals, to protect their supply chains for building car CPU. All the EUV tech was licensed to Intel, NVidia, TSMC, and other firms at what seemed to be ludicrous prices, at the time. Now, years later, everyone is selling every GPU they can make, and they have been scaling across all of the parts of their production lines.
But the changed trajectory of data-center investment is easy to miss: internal chargeback models keep the biggest investments quietly allocated to internal uses, and off of the earnings calls, and national-security buyers prefer silence. A few billion dollars here and there are still a small fraction of operating expenses and barely dent cash reserves, and only a few financial analysts pay attention to the difference between new ML-inference-datacenters and other kinds.
2027: The Plateau That Isn’t
By the beginning of 2027, the outpouring of money into prosaic applications has finally led to real scaling—billions of dollars put into models, but with 2027-era hardware, instead of 2024-era hardware. GPT-6 level models are built internally, and immediately deployed, internally.
At the same time, the outside view says progress since 2026 has plateaued: benchmarks saturate, product demos feel samey, and the story is no longer “look what it can write” but “look what it can do while touching your systems.” Inside the labs, the feeling is different. Tool-use and memory architectures make models feel “wider,” and they fit more snugly into business processes. Engineers love their models, and are increasingly emotionally dependent on their approval—but no-one has really paid attention, much less tied their uses and increasing investment to any intent on the part of the models. The safety question—“what if this becomes generally more capable?”—arrives late and sideways, expressed as SRE tickets and risk-committee minutes.
Protests about job loss due to AI accelerate, but the deep pockets of what no-one ever thought of as frontier firms, and their political influence, make this irrelevant. No-one notices that the plateau wasn’t one. The models are increasingly misaligned while being incredibly superhuman, with no notice paid. Progress seems to slow further, but the economics still work, the “plateaued” models are too profitable not to keep deploying—and no-one is even aware of sandbagging by their agentic systems.
2028: The Future
Protests over job loss flicker; the political economy of “not-quite-frontier” firms blunts them. By now the slower ramp has yielded a sharper endgame: automation has crept into decision loops across logistics, finance, and operations. Safety finally looks urgent, but most controls are post hoc and interface-level. Whether humanity has “ceded control” is arguable; what’s clearer is that control is now mediated—by dashboards, quotas, and agents that optimize whatever we write down, not what we meant.
We’ve mostly caught up to the AI-2027 timeline, with a slower ramp but a far more explosive ending. Safety is finally seen as urgent, but it is likely too late, now that humanity is increasingly ceding control of practically all of its infrastructure and decision making.
Learning from Fictional Evidence?
Of course, none of this is evidence. It’s merely a story about a world if no-one really noticed the trends, where the takeoff was later and unnoticed. But it’s also a caution against the strangely blind and equally fictitious default story. That is, the plausible alternative to Yudkowsky-inspired investments into (relatively) safety-pilled AI firms like Deepmind, OpenAI, and Anthropic isn’t a slower timeline, much less more time to solve safety issues that were never raised. In a world without MIRI, someone still eventually notices scaling works. And by default, later discovery means progress accelerates faster, with far less attention paid to safety.
My leading guess is that a world without Yudkowsky, Bostrom, or any direct replacement looks a lot more similar to our actual world, at least by 2025. Perhaps: the exact individuals and organizations (and corporate structures) leading the way are different, progress is a bit behind where it is in our world (perhaps by 6 months to a year at this point), there is less attention to the possibility of doom and less focus on alignment work.
One thing that Yudkowsky et al. did is to bring more attention to the possibility of superintelligence and what it might mean, especially among the sort of techy people who could play a role in advancing ML/AI. But without them, the possibility of thinking machines was already a standard topic in intro philosophy classes, the Turing test was widely known, Deep Blue was a major cultural event, AI and robot takeover were standard topics in sci-fi, Moore’s law was widely known, people like Kurzweil and Moravec were projecting when computers would pass human capability levels, various people were trying to do what they could with the tech that they had. A lot of AI stuff was in the groundwater, especially for the sort of techy people who could play a role in advancing ML/AI. So in nearby counterfactual worlds, as there are advances in neural nets they still have ideas like trying to get these new & improved computers to be better than humans at Go, or to be much better chatbots.
Yudkowsky was also involved in networking, e.g. helping connect founders & funders. But that seems like a kind of catalyst role that speeds up the overall process slightly, rather than summoning it where it otherwise would be absent. The specific reactions that he catalyzed might not have happened without him, but it’s the sort of thing where many people were pursuing similar opportunities and so the counterfactual involves some other combination of people doing something similar, perhaps a bit later or a bit less well.
This is what I’m talking about when I say people don’t take counterfactuals seriously—they seem to assume nothing could really be different, technology is predetermined. I didn’t even suggest that without scaling early, NLP would have hit an AI winter. For example, if today’s MS and FB had led the AI revolution, with the goals and incentives they had, you really think LLMs would have been their focus?
We can also see what happens to other accessible technologies when there isn’t excitement and market pressure. For example, solar power was abandoned for a couple decades in the 1970s and 1980s. Nuclear was as well.
And even without presuming focus stays away from LLMs much longer, in fact, in our world, we see the tremendous difference between firms that started safety-pilled, and those which did not. So I think you’re ignoring how much founder effects matter, and you’re assuming technologists would by default pay attention to risk, or would embrace conceptual models that relied on a decade of theory and debate which by assumption wouldn’t have existed.
I notice that I am confused. This alternate timeline could also require more than one divergence point.
The AI rebellion was a popular topic in sci-fi[1] well before Yudkowsky. What if, say, attempts to throw Yudkowsky under the bus would have led to similar questions raised by
Jokichi Yudaseian alternate-universe Yudkowsky analogue?In alternate-universe 2023 “Enterprises pay for throughput and uptime for real-time generation of personalized customer support and sales pitches”, while IRL mankind saw the appearance of Replika in 2017. What if the alternate-universe Replika decided to train its neural networks for optimizing the chatbots’ relatability and accidentally drove some users to suicide, chatbot psychosis, etc? They would also need to do some research into the sycophancy instinct, and this research might have generalised into AI (mis)alignment.
SOTA LLMs cannot do hard tasks without using a chain of thought. Any attempt to create a misaligned CoT would have triggered attention, so the Unnoticed Race could require that someone is dumb enough to create the neuralese without the CoT-based stage which has already highlighted alignment problems.
While I don’t think that the USSR had famous sci-fi related to the AIs’ rebellion, if I understand correctly, the USSR described itself as a state where the proletariat rebelled against the bourgeouisie. Bostrom’s Deep Utopia would require mankind to outsource work to the AIs. In addition, the USSR had this piece of sci-fi about the AI forming a personality.
Of course, any counterfactual has tons of different assumptions.
Yes, AI rebellion was a sci-fi trope, and much like human uploads or humans terraforming mars, it would have stayed that way without decades of discussion about the dynamics.
The timeline explicitly starts before 2017, and RNN-based chatbots like Replica started out don’t scale well, as they realized, and they replaced it with a model based on GPT-2 pretty early on. But sure, there’s another world where personal chatbots have enough work done to replace safety-focused AI research. Do you think it turns out better, or are you just positing another point where histories could have diverged?
Yes, but engineering challenges get solved without philosophical justification all of the time. And this is a key point being made by the entire counterfactual—it’s only because people took AGI seriously in designing LLMs that they frame the issues as alignment. To respond in more depth to the specific points:
In your posited case, CoT would certainly have been deployed as a clever trick that scales—but this doesn’t mean the models they think of as stochastic parrots start being treated as proto-AGIs with goals. They aren’t looking for true generalization, so any mistakes which need to be patched look like increased error rates to patch empirically, or places where they need a few more unit tests and ways to catch misbehavior—not a reason to design for safety for increasingly powerful models!
And before you dismiss this as implausible blindness, there are smart people who argue this way even today, despite being exposed to the arguments about increasing generality for years. So it’s certainly not obvious that they’d take people seriously when they claim that this ELIZA v12.0 released in 2025 is truly reasoning.
Thank you for the detailed critique! I agree with it, except for 2) being not another point of divergence, but a point where mankind might have returned back on a track similar to ours. What I envisioned was alternate-universe Replika rediscovering the LLMs,[1] driving users to suicide and conservatives or the USG raising questions about instilling human values into LLMs. Alas, this scenario is likely implausible, as evidenced by the lack of efforts to deal with Meta’s unaligned chatbots.
As for 1), the bus factor is hard to determine. What Yudkowsky did was to discover the AGI risks and to either accelerate the race or manage it in a safer way. Any other person capable of independently discovering the AI risks was likely infected[2] with Yudkowsky’s AI risk-related memes. But we don’t know the amount of these other people capable of discovering the risks.
P.S. The worst-case scenario is that the event of a Yudkowsky-like figure EVER emerging was highly unlikely. In this case the event could itself arguably be evidence that the world is a simulation.
In addition, users might express interest in making the companions smart enough to, say, write an essay for them or to check kids’ homework. If Replika did it, the LLMs would have to be scaled up and up...
Edited to add: For comparison, when making my post about colonialism in space, I wasn’t aware about Robin Hanson having made a similar model. What I did differently was to produce an argument potentially implying that there are two attractors and that one can align the AIs to one of the attractors even if alignment in SOTA meaning stays unsolved.