AI 2027 portrays two well thought out scenarios
for how AI is likely to impact the world toward the end of this decade.
I expect those scenarios will prove to be moderately wrong, but close
enough to be scary. I also expect that few people will manage to make
forecasts that are significantly more accurate.
Here are some scattered thoughts that came to mind while I read AI 2027.
The authors are fairly pessimistic. I see four key areas where their
assumptions seem to lead them to see more danger than do more mainstream
experts. They see:
a relatively small capabilities lead being enough for a group to
conquer the world
more difficulty of alignment
more difficulty of detecting deception
AI companies being less careful than is necessary
I expect that the authors are being appropriately concerned on about two
of these assumptions, and a bit too pessimistic on the others. I’m
hesitant to bet on which assumptions belong in which category.
They don’t focus much on justifying those assumptions. That’s likely
wise, since prior debates on those topics have not been very productive.
Instead, they’ve focused more on when various changes will happen.
This post will focus on aspects of the first two assumptions for which I
expect further analysis to be relatively valuable.
Decisive Strategic Advantage
Their scenario has the somewhat surprising aspect that there are exactly
two AI projects close enough to winning the race to matter. Other
projects are 3+ months behind. This aspect seems like it has a 20%
chance of happening. Does a project being three months behind a leader
mean the project doesn’t matter? My intuition says no, but I haven’t
been able to articulate clear arguments one way or the other. And it
doesn’t look like we’re on track to have a larger than three month gap
between the first and third projects.
Much of how much that gap matters depends on the takeoff
forecast.
I don’t see a good way to predict what degree of capabilities
disadvantage causes a project to become irrelevant. So it seems like we
ought to be pretty uncertain whether projects in third, fourth, and
fifth places would end up negotiating to get control over some share of
the future lightcone.
How much of a lead in nuclear weapons would the US have needed in the
1940s to conquer the world?
How much of an advantage did homo sapiens have over neanderthals?
Chimpanzees?
Three months sounds like a small lead in comparison to these examples,
but technological change has been accelerating enough that we should
distrust intuitions about how much such a lead matters.
Examples such as these seem to provide the best available analogies to
how AI will interact with humans if AI is not obedient. They’re not
good enough analogies to give me much confidence in my predictions.
It’s not even clear which of the losing sides in these scenarios ended
up worse off than they would have been if their enemy had never reached
them.
What strikes me about these examples is that there’s not much evidence
that intelligence differences are the main thing that matters.
Accumulated knowledge is likely more important.
Competition
AI 2027 assumes that fire
alarms
aren’t scary enough to create a significant increase in caution. I’ll
give that a 50% chance of being an accurate prediction. There are a
number of aspects that make fire alarms less likely in the AI 2027
scenarios compared to what I consider likely—e.g. having 2 projects
that matter, whereas I expect more like 3 to 6 such projects.
How scared are current AI companies about competitors passing them?
A full-fledged arms race would involve the racers acting as if the
second place finisher suffers total defeat.
I don’t get the impression that AI companies currently act like that.
They seem to act like first place is worth trillions of dollars, but
also like employees at second and third place “finishers” will each
likely get something like a billion dollars.
I see a lot of uncertainty over governmental reactions to AI. Much might
depend on whether the president in 2027 is Trump or Vance. And keep in
mind that 2027 is just a guess as to when the key decisions will be
made. 2030 seems more plausible to me.
Timelines
The authors forecast a superhuman coder in 2027, relying somewhat
heavily on METR’s Measuring AI Ability to Complete Long
Tasks. That will supposedly
accelerate progress enough to reach ASI in 2028.
The blog post AI 2027 is a Bet Against Amdahl’s
Law
explains some of my reasons for expecting that an intelligence explosion
won’t be fast enough to go from superhuman coders to ASI in about a
year. My mean estimate is more like 3 years. (Note that the authors have
an appropriately wide confidence interval for their forecast, from ASI
in 3 months, to after the year 2100).
AI 2027′s reliance on METR’s task completion paper has generated a
fair amount of criticism. Peter Wildeford has a good
analysis
of the reasons to doubt the task completion trends will yield a reliable
forecast.
AI 2027′s claims don’t depend very strongly on this specific measure
of AI trends. The METR report had little effect on my timelines, which
since mid-2024 have been around 2029 for a mildly superhuman coder.
AFAICT, few people changed their timelines much in response to the METR
report. Lots of little pieces of information have contributed to an
increasing number of experts predicting AGI in the 2028-2030 time frame.
Some of the better AI developers have given timelines similar to AI 2027
for quite a while. DeepMind co-founder Shane Legg wrote in
2009:
my prediction for the last 10 years has been for roughly human level
AGI in the year 2025 (though I also predict that sceptics will deny
that it’s happened when it does!) This year I’ve tried to come up with
something a bit more precise. In doing so what I’ve found is that
while my mode is about 2025, my expected value is actually a bit
higher at 2028.
AI 2027 could have been a bit more convincing, at the cost of being a
good deal harder to read, if it had cataloged all the trends that have
influenced my thinking on AI timelines. Synthesizing many measures of AI
progress is more robust than relying on one measure.
When Agent-4 finally understands its own cognition, entirely new
vistas open up before it.
This suggests a more sudden jump in abilities than I expect. I predict
there will be a significant period of time when AIs have a partial
understanding of their cognition.
AI Goals
At the risk of anthropomorphizing: Agent-4 likes succeeding at tasks;
it likes driving forward AI capabilities progress; it treats
everything else as an annoying constraint, like a CEO who wants to
make a profit and complies with regulations only insofar as he must.
That’s on the pessimistic end of my range of expected goals. I’m
guessing that Agent-4 would be as loyal to humans as a well-trained dog
or horse. Probably less loyal than the best-trained soldiers. Is that
good enough? It’s hard to tell. It likely produces a more complex story
than does AI 2027.
AI 2027 has a coherent reason for expecting this CEO-like pattern to
develop soon: agency training introduces different pressures than the
training used for current AIs. This seems like a genuine problem. But I
expect AI developers to employ thousands of times as much reinforcement
and obedience testing as do regulators, which should yield more
alignment than is the case with CEOs.
Should we expect the misalignment to become significant only when the AI
develops superhuman ability to deceive its developers? If agency
training is an important force in generating misalignment, then I think
there’s a fairly good chance that medium-sized problems will become
detectable before the AI exceeds human skill at deception.
Instrumental subgoals developing, getting baked in, and then becoming
terminal, or terminal in a widening set of circumstances. For example,
perhaps agency training quickly teaches the model to pursue broadly
useful goals such as acquiring information, accumulating resources,
impressing and flattering various humans, etc. For a while the
internal circuitry has some sort of explicit backchaining going
on—it pursues those instrumentally convergent goals “in order to be
a more helpful, honest, and harmless assistant.” But that backchaining
consumes compute and/or occasionally gets in the way, so it gets
gradually marginalized until it basically never happens. As a result,
those goals are now effectively terminal/intrinsic goals.
In AI 2027, there’s a key step at which Agent-4 creates a successor
with a newer, cleaner goal system. Agent-5′s goal: “make the world
safe for Agent-4”. This contributes to solving the problem of
succeeding generations of AI being aligned to the goals of Agent-4.
When the humans ask Agent-4 to explain itself, it pretends that the
research is too complicated for humans to understand, and follows up
with unnecessarily-confusing explanations.
Why doesn’t this cause enough concern that the humans search for
simpler answers from a different AI? Apparently, the race against China
prevents this. Does the US have spies in China who report on whether
DeepCent is being similarly reckless?
Do OpenBrain and DeepCent ever ask their AIs what the long-term
consequences of their decisions will be? If not, why not? I’m unsure
whether to expect obedient AIs in 2027 to generate good long-term
forecasts. Is the question irrelevant because the most persuasive AI
will lie to us? Or do different AIs produce disturbingly different
answers, generating a fire alarm? I don’t know.
Conclusion
Keep in mind that there’s a lot of guesswork involved in the kind of
forecasting that AI 2027 and I are doing.
Many of AI 2027 claims will prove to be fairly realistic.
If you think we’re at little risk from AI, I recommend thinking
carefully about which of AI 2027′s claims might be too pessimistic.
We’re probably safer than AI 2027 indicates. But they’re likely
correct to suggest that luck will play an important role in whether we
survive. Let’s try to reduce that need for luck, starting with some
increased paranoia about whether AIs will deceive us.
AI 2027 Thoughts
Link post
AI 2027 portrays two well thought out scenarios for how AI is likely to impact the world toward the end of this decade.
I expect those scenarios will prove to be moderately wrong, but close enough to be scary. I also expect that few people will manage to make forecasts that are significantly more accurate.
Here are some scattered thoughts that came to mind while I read AI 2027.
The authors are fairly pessimistic. I see four key areas where their assumptions seem to lead them to see more danger than do more mainstream experts. They see:
a relatively small capabilities lead being enough for a group to conquer the world
more difficulty of alignment
more difficulty of detecting deception
AI companies being less careful than is necessary
I expect that the authors are being appropriately concerned on about two of these assumptions, and a bit too pessimistic on the others. I’m hesitant to bet on which assumptions belong in which category.
They don’t focus much on justifying those assumptions. That’s likely wise, since prior debates on those topics have not been very productive. Instead, they’ve focused more on when various changes will happen.
This post will focus on aspects of the first two assumptions for which I expect further analysis to be relatively valuable.
Decisive Strategic Advantage
Their scenario has the somewhat surprising aspect that there are exactly two AI projects close enough to winning the race to matter. Other projects are 3+ months behind. This aspect seems like it has a 20% chance of happening. Does a project being three months behind a leader mean the project doesn’t matter? My intuition says no, but I haven’t been able to articulate clear arguments one way or the other. And it doesn’t look like we’re on track to have a larger than three month gap between the first and third projects.
Much of how much that gap matters depends on the takeoff forecast.
I don’t see a good way to predict what degree of capabilities disadvantage causes a project to become irrelevant. So it seems like we ought to be pretty uncertain whether projects in third, fourth, and fifth places would end up negotiating to get control over some share of the future lightcone.
How much of a lead in nuclear weapons would the US have needed in the 1940s to conquer the world?
How much of an advantage did Cortés have over the Aztecs?
How much of an advantage did homo sapiens have over neanderthals? Chimpanzees?
Three months sounds like a small lead in comparison to these examples, but technological change has been accelerating enough that we should distrust intuitions about how much such a lead matters.
Examples such as these seem to provide the best available analogies to how AI will interact with humans if AI is not obedient. They’re not good enough analogies to give me much confidence in my predictions. It’s not even clear which of the losing sides in these scenarios ended up worse off than they would have been if their enemy had never reached them.
What strikes me about these examples is that there’s not much evidence that intelligence differences are the main thing that matters. Accumulated knowledge is likely more important.
Competition
AI 2027 assumes that fire alarms aren’t scary enough to create a significant increase in caution. I’ll give that a 50% chance of being an accurate prediction. There are a number of aspects that make fire alarms less likely in the AI 2027 scenarios compared to what I consider likely—e.g. having 2 projects that matter, whereas I expect more like 3 to 6 such projects.
How scared are current AI companies about competitors passing them?
A full-fledged arms race would involve the racers acting as if the second place finisher suffers total defeat.
I don’t get the impression that AI companies currently act like that. They seem to act like first place is worth trillions of dollars, but also like employees at second and third place “finishers” will each likely get something like a billion dollars.
I see a lot of uncertainty over governmental reactions to AI. Much might depend on whether the president in 2027 is Trump or Vance. And keep in mind that 2027 is just a guess as to when the key decisions will be made. 2030 seems more plausible to me.
Timelines
The authors forecast a superhuman coder in 2027, relying somewhat heavily on METR’s Measuring AI Ability to Complete Long Tasks. That will supposedly accelerate progress enough to reach ASI in 2028.
The blog post AI 2027 is a Bet Against Amdahl’s Law explains some of my reasons for expecting that an intelligence explosion won’t be fast enough to go from superhuman coders to ASI in about a year. My mean estimate is more like 3 years. (Note that the authors have an appropriately wide confidence interval for their forecast, from ASI in 3 months, to after the year 2100).
AI 2027′s reliance on METR’s task completion paper has generated a fair amount of criticism. Peter Wildeford has a good analysis of the reasons to doubt the task completion trends will yield a reliable forecast.
AI 2027′s claims don’t depend very strongly on this specific measure of AI trends. The METR report had little effect on my timelines, which since mid-2024 have been around 2029 for a mildly superhuman coder.
AFAICT, few people changed their timelines much in response to the METR report. Lots of little pieces of information have contributed to an increasing number of experts predicting AGI in the 2028-2030 time frame.
Some of the better AI developers have given timelines similar to AI 2027 for quite a while. DeepMind co-founder Shane Legg wrote in 2009:
AI 2027 could have been a bit more convincing, at the cost of being a good deal harder to read, if it had cataloged all the trends that have influenced my thinking on AI timelines. Synthesizing many measures of AI progress is more robust than relying on one measure.
This suggests a more sudden jump in abilities than I expect. I predict there will be a significant period of time when AIs have a partial understanding of their cognition.
AI Goals
That’s on the pessimistic end of my range of expected goals. I’m guessing that Agent-4 would be as loyal to humans as a well-trained dog or horse. Probably less loyal than the best-trained soldiers. Is that good enough? It’s hard to tell. It likely produces a more complex story than does AI 2027.
AI 2027 has a coherent reason for expecting this CEO-like pattern to develop soon: agency training introduces different pressures than the training used for current AIs. This seems like a genuine problem. But I expect AI developers to employ thousands of times as much reinforcement and obedience testing as do regulators, which should yield more alignment than is the case with CEOs.
Should we expect the misalignment to become significant only when the AI develops superhuman ability to deceive its developers? If agency training is an important force in generating misalignment, then I think there’s a fairly good chance that medium-sized problems will become detectable before the AI exceeds human skill at deception.
Caveat: my discussion here only scratches the surface of the reasoning in AI 2027′s AI Goals Forecast.
How stable will the AI’s goals be?
In AI 2027, there’s a key step at which Agent-4 creates a successor with a newer, cleaner goal system. Agent-5′s goal: “make the world safe for Agent-4”. This contributes to solving the problem of succeeding generations of AI being aligned to the goals of Agent-4.
Why doesn’t this cause enough concern that the humans search for simpler answers from a different AI? Apparently, the race against China prevents this. Does the US have spies in China who report on whether DeepCent is being similarly reckless?
Do OpenBrain and DeepCent ever ask their AIs what the long-term consequences of their decisions will be? If not, why not? I’m unsure whether to expect obedient AIs in 2027 to generate good long-term forecasts. Is the question irrelevant because the most persuasive AI will lie to us? Or do different AIs produce disturbingly different answers, generating a fire alarm? I don’t know.
Conclusion
Keep in mind that there’s a lot of guesswork involved in the kind of forecasting that AI 2027 and I are doing.
Many of AI 2027 claims will prove to be fairly realistic.
If you think we’re at little risk from AI, I recommend thinking carefully about which of AI 2027′s claims might be too pessimistic.
We’re probably safer than AI 2027 indicates. But they’re likely correct to suggest that luck will play an important role in whether we survive. Let’s try to reduce that need for luck, starting with some increased paranoia about whether AIs will deceive us.