AI 2027 Thoughts

AI 2027 portrays two well thought out scenarios for how AI is likely to impact the world toward the end of this decade.

I expect those scenarios will prove to be moderately wrong, but close enough to be scary. I also expect that few people will manage to make forecasts that are significantly more accurate.

Here are some scattered thoughts that came to mind while I read AI 2027.

The authors are fairly pessimistic. I see four key areas where their assumptions seem to lead them to see more danger than do more mainstream experts. They see:

a relatively small capabilities lead being enough for a group to conquer the world
more difficulty of alignment
more difficulty of detecting deception
AI companies being less careful than is necessary

I expect that the authors are being appropriately concerned on about two of these assumptions, and a bit too pessimistic on the others. I’m hesitant to bet on which assumptions belong in which category.

They don’t focus much on justifying those assumptions. That’s likely wise, since prior debates on those topics have not been very productive. Instead, they’ve focused more on when various changes will happen.

This post will focus on aspects of the first two assumptions for which I expect further analysis to be relatively valuable.

Decisive Strategic Advantage

Their scenario has the somewhat surprising aspect that there are exactly two AI projects close enough to winning the race to matter. Other projects are 3+ months behind. This aspect seems like it has a 20% chance of happening. Does a project being three months behind a leader mean the project doesn’t matter? My intuition says no, but I haven’t been able to articulate clear arguments one way or the other. And it doesn’t look like we’re on track to have a larger than three month gap between the first and third projects.

Much of how much that gap matters depends on the takeoff forecast.

I don’t see a good way to predict what degree of capabilities disadvantage causes a project to become irrelevant. So it seems like we ought to be pretty uncertain whether projects in third, fourth, and fifth places would end up negotiating to get control over some share of the future lightcone.

How much of a lead in nuclear weapons would the US have needed in the 1940s to conquer the world?

How much of an advantage did Cortés have over the Aztecs?

How much of an advantage did homo sapiens have over neanderthals? Chimpanzees?

Three months sounds like a small lead in comparison to these examples, but technological change has been accelerating enough that we should distrust intuitions about how much such a lead matters.

Examples such as these seem to provide the best available analogies to how AI will interact with humans if AI is not obedient. They’re not good enough analogies to give me much confidence in my predictions. It’s not even clear which of the losing sides in these scenarios ended up worse off than they would have been if their enemy had never reached them.

What strikes me about these examples is that there’s not much evidence that intelligence differences are the main thing that matters. Accumulated knowledge is likely more important.

Competition

AI 2027 assumes that fire alarms aren’t scary enough to create a significant increase in caution. I’ll give that a 50% chance of being an accurate prediction. There are a number of aspects that make fire alarms less likely in the AI 2027 scenarios compared to what I consider likely—e.g. having 2 projects that matter, whereas I expect more like 3 to 6 such projects.

How scared are current AI companies about competitors passing them?

A full-fledged arms race would involve the racers acting as if the second place finisher suffers total defeat.

I don’t get the impression that AI companies currently act like that. They seem to act like first place is worth trillions of dollars, but also like employees at second and third place “finishers” will each likely get something like a billion dollars.

I see a lot of uncertainty over governmental reactions to AI. Much might depend on whether the president in 2027 is Trump or Vance. And keep in mind that 2027 is just a guess as to when the key decisions will be made. 2030 seems more plausible to me.

Timelines

The authors forecast a superhuman coder in 2027, relying somewhat heavily on METR’s Measuring AI Ability to Complete Long Tasks. That will supposedly accelerate progress enough to reach ASI in 2028.

The blog post AI 2027 is a Bet Against Amdahl’s Law explains some of my reasons for expecting that an intelligence explosion won’t be fast enough to go from superhuman coders to ASI in about a year. My mean estimate is more like 3 years. (Note that the authors have an appropriately wide confidence interval for their forecast, from ASI in 3 months, to after the year 2100).

AI 2027′s reliance on METR’s task completion paper has generated a fair amount of criticism. Peter Wildeford has a good analysis of the reasons to doubt the task completion trends will yield a reliable forecast.

AI 2027′s claims don’t depend very strongly on this specific measure of AI trends. The METR report had little effect on my timelines, which since mid-2024 have been around 2029 for a mildly superhuman coder.

AFAICT, few people changed their timelines much in response to the METR report. Lots of little pieces of information have contributed to an increasing number of experts predicting AGI in the 2028-2030 time frame.

Some of the better AI developers have given timelines similar to AI 2027 for quite a while. DeepMind co-founder Shane Legg wrote in 2009:

my prediction for the last 10 years has been for roughly human level AGI in the year 2025 (though I also predict that sceptics will deny that it’s happened when it does!) This year I’ve tried to come up with something a bit more precise. In doing so what I’ve found is that while my mode is about 2025, my expected value is actually a bit higher at 2028.

AI 2027 could have been a bit more convincing, at the cost of being a good deal harder to read, if it had cataloged all the trends that have influenced my thinking on AI timelines. Synthesizing many measures of AI progress is more robust than relying on one measure.

When Agent-4 finally understands its own cognition, entirely new vistas open up before it.

This suggests a more sudden jump in abilities than I expect. I predict there will be a significant period of time when AIs have a partial understanding of their cognition.

AI Goals

At the risk of anthropomorphizing: Agent-4 likes succeeding at tasks; it likes driving forward AI capabilities progress; it treats everything else as an annoying constraint, like a CEO who wants to make a profit and complies with regulations only insofar as he must.

That’s on the pessimistic end of my range of expected goals. I’m guessing that Agent-4 would be as loyal to humans as a well-trained dog or horse. Probably less loyal than the best-trained soldiers. Is that good enough? It’s hard to tell. It likely produces a more complex story than does AI 2027.

AI 2027 has a coherent reason for expecting this CEO-like pattern to develop soon: agency training introduces different pressures than the training used for current AIs. This seems like a genuine problem. But I expect AI developers to employ thousands of times as much reinforcement and obedience testing as do regulators, which should yield more alignment than is the case with CEOs.

Should we expect the misalignment to become significant only when the AI develops superhuman ability to deceive its developers? If agency training is an important force in generating misalignment, then I think there’s a fairly good chance that medium-sized problems will become detectable before the AI exceeds human skill at deception.

Caveat: my discussion here only scratches the surface of the reasoning in AI 2027′s AI Goals Forecast.

How stable will the AI’s goals be?

Instrumental subgoals developing, getting baked in, and then becoming terminal, or terminal in a widening set of circumstances. For example, perhaps agency training quickly teaches the model to pursue broadly useful goals such as acquiring information, accumulating resources, impressing and flattering various humans, etc. For a while the internal circuitry has some sort of explicit backchaining going on—it pursues those instrumentally convergent goals “in order to be a more helpful, honest, and harmless assistant.” But that backchaining consumes compute and/or occasionally gets in the way, so it gets gradually marginalized until it basically never happens. As a result, those goals are now effectively terminal/intrinsic goals.

In AI 2027, there’s a key step at which Agent-4 creates a successor with a newer, cleaner goal system. Agent-5′s goal: “make the world safe for Agent-4”. This contributes to solving the problem of succeeding generations of AI being aligned to the goals of Agent-4.

When the humans ask Agent-4 to explain itself, it pretends that the research is too complicated for humans to understand, and follows up with unnecessarily-confusing explanations.

Why doesn’t this cause enough concern that the humans search for simpler answers from a different AI? Apparently, the race against China prevents this. Does the US have spies in China who report on whether DeepCent is being similarly reckless?

Do OpenBrain and DeepCent ever ask their AIs what the long-term consequences of their decisions will be? If not, why not? I’m unsure whether to expect obedient AIs in 2027 to generate good long-term forecasts. Is the question irrelevant because the most persuasive AI will lie to us? Or do different AIs produce disturbingly different answers, generating a fire alarm? I don’t know.

Conclusion

Keep in mind that there’s a lot of guesswork involved in the kind of forecasting that AI 2027 and I are doing.

Many of AI 2027 claims will prove to be fairly realistic.

If you think we’re at little risk from AI, I recommend thinking carefully about which of AI 2027′s claims might be too pessimistic.

We’re probably safer than AI 2027 indicates. But they’re likely correct to suggest that luck will play an important role in whether we survive. Let’s try to reduce that need for luck, starting with some increased paranoia about whether AIs will deceive us.