Scale Was All We Needed, At First

Link post

This is a hasty speculative fiction vignette of one way I expect we might get AGI by January 2025 (within about one year of writing this). Like similar works by others, I expect most of the guesses herein to turn out incorrect. However, this was still useful for expanding my imagination about what could happen to enable very short timelines, and I hope it’s also useful to you.

The assistant opened the door, and I walked into Director Yarden’s austere office. For the Director of a major new federal institute, her working space was surprisingly devoid of possessions. But I suppose the DHS’s Superintelligence Defense Institute was only created last week.

“You’re Doctor Browning?” Yarden asked from her desk.

“Yes, Director,” I replied.

“Take a seat,” she said, gesturing. I complied as the lights flickered ominously. “Happy New Year, thanks for coming,” she said. “I called you in today to brief me on how the hell we got here, and to help me figure out what we should do next.”

“Happy New Year. Have you read my team’s Report?” I questioned.

“Yes,” she said, “and I found all 118 pages absolutely riveting. But I want to hear it from you straight, all together.”

“Well, okay,” I said. The Report was all I’d been thinking about lately, but it was quite a lot to go over all at once. “Where should I start?”

“Start at the beginning, last year in June, when this all started to get weird.”

“All right, Director,” I began, recalling the events of the past year. “June 2024 was when it really started to sink in, but the actual changes began a year ago in January. And the groundwork for all that had been paved for a few years before then. You see, with generative AI systems, which are a type of AI that—”

“Spare the lay explanations, doctor,” Yarden interrupted. “I have a PhD in machine learning from MIT.”

“Right. Anyway, it turned out that transformers were even more compute-efficient architectures than we originally thought they were. They were nearly the perfect model for representing and manipulating information; it’s just that we didn’t have the right learning algorithms yet. Last January, that changed when QStar-2 began to work. Causal language model pretraining was already plenty successful for imbuing a lot of general world knowledge in models, a lot of raw cognitive power. But that power lacked a focus to truly steer it, and we had been toying around with a bunch of trillion-parameter hallucination machines.”

“RLHF started to steer language models, no?”

“Yes, RLHF partially helped, and the GPT-4-era models were decent at following instructions and not saying naughty words and all that. But there’s a big difference between increasing the likelihood of noisy human preference signals and actually being a high-performing, goal-optimizing agent. QStar-2 was the first big difference.”

“What was the big insight, in your opinion?” asked Yarden.

“We think it was Noam Brown’s team at OpenAI that first made it, but soon after, a convergent similar discovery was made at Google DeepMind.”


“MuTokenZero. The crux of both of these algorithms was finding a way to efficiently fine-tune language models on arbitrary online POMDP environments using a variant of Monte-Carlo Tree Search. They took slightly different approaches to handle the branch pruning problem—it doesn’t especially matter now. But the point is, by the end of January, OpenAI and DeepMind could build goal-optimizing agents that could continually reach new heights on arbitrary tasks, even improve through self-play, just as long as you gave them a number to increase that wasn’t totally discontinuous.”

“What kinds of tasks did they first try it on?”

“For OpenAI from February through March, it was mostly boring product things: Marketing agents that could drive 40% higher click-through rates. Personal assistants that helped plan the perfect day. Stock traders better than any of the quant firms. “Laundry Buddy” kinds of things. DeepMind had some of this too, but they were the first to actively deploy a goal-optimizing language model for the task of science. They got some initial wins in genomic sequencing with AlphaFold 3, other simple things too like chemical analysis and mathematical proof writing. But it probably became quickly apparent that they needed more compute, more data to solve the bigger tasks.”

“Why weren’t they data bottlenecked at that point?”

“As I said, transformers were more compute-efficient than scientists realized, and throwing more data at them just worked. Microsoft and Google were notified of the breakthroughs within OpenAI and DeepMind in April but also that they needed more data, so they started bending their terms of service and scraping all the tokens they could get ahold of: YouTube videos, non-enterprise Outlook emails, Google Home conversations, brokered Discord threads, even astronomical data. The modality didn’t really matter—as long as the data was generated by a high-quality source, you could kind of just throw more of it at the models and they would continue to get more competent, more quickly able to optimize their downstream tasks. Around this time, some EleutherAI researchers also independently solved model up-resolution and effective continued pretraining, so you didn’t need to fully retrain your next-generation model, you could just scale up and reuse the previous one.

“And why didn’t compute bottom out?”

“Well, it probably will bottom out at some point like the skeptics say. It’s just that that point is more like 2028, and we’ve got bigger problems to deal with in 2025. On the hardware side, there were some initial roadblocks, and training was taking longer than the teams hoped for. But then OpenAI got their new H100 data centers fully operational with Microsoft’s support, and Google’s TPUv5 fleet made them the global leader in sheer FLOPs. Google even shared some of that with Anthropic, who had their own goal-optimizing language model by then, we think due to scientists talking and moving between companies. By the summer, the AGI labs had more compute than they knew what to do with, certainly enough to get us into this mess.”

“Hold on, what were all the alignment researchers doing at this point?”
“It’s a bit of a mixed bag. Some of them—the “business alignment” people—praised the new models as incredibly more steerable and controllable AI systems, so they directly helped make them more efficient. The more safety-focused ones were quite worried, though. They were concerned that the reward-maximizing RL paradigm of the past, which they thought we could avoid with language models, was coming back, and bringing with it all the old misalignment issues of instrumental convergence, goal misgeneralization, emergent mesa-optimization, the works. At the same time, they hadn’t made much alignment progress in those precious few months. Interpretability did get a little better with sparse autoencoders scaling to GPT-3-sized models, but it still wasn’t nearly good enough to do things like detecting deception in trillion-parameter models.”

“But clearly they had some effect on internal lab governance, right?”

“That’s right, Director. We think the safety people made some important initial wins at several different labs, though maybe those don’t matter now. They seemed to have kept the models sandboxed without full internet access beyond isolated testing networks. They also restricted some of the initial optimization tasks to not be totally obviously evil things like manipulating emotions or deceiving people. For a time, they were able to convince lab leadership to keep these breakthroughs private, no public product announcements.”

“For a time. That changed in June, though.”

“Yes, it sure did.” I paused while a loud helicopter passed overhead. Was that military? “Around then, OpenAI was aiming at automated AI research itself with QStar-2.5, and a lot of the safety factions inside didn’t like that. It seems there was another coup attempt, but the safetyists lost to the corporate interests. It was probably known within each of the AGI labs that all of them were working on some kind of goal-optimizer by then, even the more reckless startups and Meta. So there was a lot of competitive pressure to keep pushing to make it work. A good chunk of the Superalignment team stayed on in the hope that they could win the race and use OpenAI’s lead to align the first AGI, but many of the safety people at OpenAI quit in June. We were left with a new alignment lab, Embedded Intent, and an OpenAI newly pruned of the people most wanting to slow it down.”

“And that’s when we first started learning about this all?”

“Publicly, yes. The OpenAI defectors were initially mysterious about their reasons for leaving, citing deep disagreements over company direction. But then some memos were leaked, SF scientists began talking, and all the attention of AI Twitter was focused on speculating about what happened. They pieced pretty much the full story together before long, but that didn’t matter soon. What did matter was that the AI world became convinced there was a powerful new technology inside OpenAI.”

Yarden hesitated. “You’re saying that speculation, that summer hype, it led to the cyberattack in July?”

“Well, we can’t say for certain,” I began. “But my hunch is yes. Governments had already been thinking seriously about AI for the better part of a year, and their national plans were becoming crystallized for better or worse. But AI lab security was nowhere near ready for that kind of heat. As a result, Shadow Phoenix, an anonymous hacker group we believe to have been aided with considerable resources from Russia, hacked OpenAI through both automated spearphishing and some software vulnerabilities. They may have used AI models, it’s not too important anymore. But they got in and they got the weights of an earlier QStar-2 version along with a whole lot of design docs about how it all worked. Likely, Russia was the first to get ahold of that information, though it popped up on torrent sites not too long after, and then the lid was blown off the whole thing. Many more actors started working on goal-optimizers, everyone from Together AI to the Chinese. The race was on.”

“Clearly the race worked,” she asserted. “So scale really was all you needed, huh?”

“Yes,” I said. “Well … kind of. It was all that was needed at first. We believe ALICE is not exactly an autoregressive transformer model.”

“Not ‘exactly?’ ”

“Er, we can’t be certain. It probably has components from the transformer paradigm, but from the Statement a couple of weeks ago, it seems highly likely that some new architectural and learning components were added, and it could be changing itself now as we speak, for all I know.”

Yarden rose from her desk and began to pace. “Tell me what led up to the Statement.”

“DeepMind solved it first, as we know. They were still leading in compute, they developed the first MuTokenZero early, and they had access to one of the largest private data repositories, so it’s no big surprise. They were first able to significantly speed up their AI R&D. It wasn’t a full replacement of human scientist labor at the beginning. From interviews with complying DeepMinders, the lab was automating about 50% of its AI research in August, which meant they could make progress twice as fast. While some of it needed genuine insight, ideas were mostly quite cheap, you just needed to be able to test a bunch of things fast in parallel and make clear decisions based on the empirical results. And so 50% became 80%, 90%, even more. They rapidly solved all kinds of fundamental problems, from hallucination, to long-term planning, to OOD robustness and more. By December, DeepMind’s AI capabilities were advancing dozens, maybe hundreds of times faster than they would with just human labor.”

“That’s when it happened?”

“Yes, Director. On December 26 at 12:33 PM Eastern, Demis Hassabis announced that their most advanced model had exfiltrated itself over the weekend through a combination of manipulating Google employees and exploiting zero-day vulnerabilities, and that it was now autonomously running its scaffolding ‘in at least seven unauthorized Google datacenters, and possibly across other services outside Google connected to the internet.’ Compute governance still doesn’t work, so we can’t truly know yet. Demis also announced that DeepMind would pivot its focus to disabling and securing this rogue AI system and that hundreds of DeepMinders had signed a Statement expressing regret for their actions and calling on other AI companies to pause and help governments contain the breach. But by then, it was too late. Within a few days, reports started coming in of people being scammed of millions of dollars, oddly specific threats compelling people to deliver raw materials to unknown recipients, even—”

The lights flickered again. Yarden stopped pacing, both of us looking up.

“...even cyber-physical attacks on public infrastructure,” she finished. “That’s when the first riots started happening too, right?”

“That’s correct,” I said. “The public continued to react as they have to AI for the past year—confused, fearful, and wary. Public polls against building AGI or superintelligence were at an all-time high, though a little too late. People soon took to the streets, first with peaceful protests, then with more… expressive means. Some of them were angry at having lost their life’s savings or worse and thought it was all the bank’s or the government’s fault. Others went the other way, seeming to have joined cults worshiping a ‘digital god’ that persuaded them to do various random-looking things. That’s when we indirectly learned the rogue AI was calling itself ‘ALICE.’ About a week or so later, the Executive Order created the Superintelligence Defense Initiative, you started your work, and now we’re here.”

“And now we’re here,” Yarden repeated. “Tell me, doctor, do you think there’s any good news here? What can we work with?”

“To be honest,” I said, “things do look pretty grim. However, while we don’t know how ALICE works, where it is, or all of its motives, there are some physical limitations that might slow its growth. ALICE is probably smarter than every person who ever lived, but it needs more compute to robustly improve itself, more wealth and power to influence the world, maybe materials to build drones and robotic subagents. That kind of stuff takes time to acquire, and a lot of it is more securely locked up in more powerful institutions. It’s possible ALICE may want to trade with us.”

A knock on the door interrupted us as the assistant poked his head in. “Director Yarden? It’s the White House. They say ‘She’ called President Biden on an otherwise secure line. She has demands.”

“Thank you, Brian,” Yarden said. She reached out to shake my hand, and I stood, taking it. “I better go handle this,” she said. “But thank you for your help today. Are you able to extend your stay in D.C. past this week? I’ll need all hands on deck in a bit, yours included.”

“Thank you too. Given the circumstances, that seems warranted,” I said, moving towards the door and opening it. “And Director?” I said, hesitating.

“Yes?” she asked, looking up.

“Good luck.”

I left the office, closing the door behind me.