Some thoughts on risks from narrow, non-agentic AI

Here are some concerns which have been raised about the development of advanced AI:

  • Power might become concentrated with agentic AGIs which are highly misaligned with humanity as a whole (the second species argument).

  • AI might allow power to become concentrated to an unprecedented extent with elites who are misaligned with humanity as a whole.

  • Competitive pressures to use narrow, non-agentic AIs trained on easily-measurable metrics might become harmful enough to cause a “slow-rolling catastrophe”. [Edit: it seems like this is not the intended interpretation of Paul’s argument in What Failure Looks Like; see discussion in the comments section. So I no longer fully endorse this section, but I’ve left it up for reference purposes.]

  • AI might make catastrophic conflicts easier or more likely; in other words, the world might become more vulnerable with respect to available technology.

  • AIs might be morally relevant, but be treated badly.

I’ve already done a deep dive on the second species argument, so in this post I’m going to focus on the others—the risks which don’t depend on thinking of AIs as autonomous agents with general capabilities. Warning: this is all very speculative; I’m mainly just trying to get a feeling for the intellectual terrain, since I haven’t seen many explorations of these concerns so far.

Inequality and totalitarianism

One key longtermist concern about inequality is that certain groups might get (semi)permanently disenfranchised; in other words, suboptimal values might be locked in. Yet this does not seem to have happened in the past: moral progress has improved the treatment of slaves, women, non-Europeans, and animals over the last few centuries, despite those groups starting off with little power. It seems to me that most of these changes were driven by the moral concerns of existing elites, backed by public sentiment in wealthy countries, rather than improvements in the bargaining position of the oppressed groups which made it costlier to treat them badly (although see here for an opposing perspective). For example, ending the slave trade was very expensive for Britain; the Civil War was very expensive for the US; and so on. Perhaps the key exception is the example of anti-colonialist movements—but even then, public moral pressure (e.g. opposition to harming non-violent protesters) was a key factor.

What would reduce the efficacy of public moral pressure? One possibility is dramatic increases in economic inequality. Currently, one limiting factor on inequality is the fact that most people have a significant amount of human capital, which they can convert to income. However, AI automation will make most forms of human capital much less valuable, and therefore sharply increase inequality. This didn’t happen to humans after the industrial revolution, because human intellectual skills ended up being more valuable in absolute terms after a lot of physical labour was automated. But it did happen to horses, who lost basically all their equine capital.

Will any human skills remain valuable after AGI, or will we end up in a similar position to horses? I expect that human social skills will become more valuable even if they can be replicated by AIs, because people care about human interaction for its own sake. And even if inequality increases dramatically, we should expect the world to also become much richer, making almost everyone wealthier in absolute terms in the medium term. In particular, as long as the poor have comparable levels of political power as they do today, they can use that to push the rich to redistribute wealth. This will be easiest on a domestic level, but it also seems that citizens of wealthy countries are currently sufficiently altruistic to advocate for transfers of wealth to poorer countries, and will do so even more if international inequality grows.

So to a first approximation, we can probably think about concerns about inequality as a subset of concerns about preventing totalitarianism: mere economic inequality within a (somewhat democratic) rule of law seems insufficient to prevent the sort of progress that is historically standard, even if inequality between countries dramatically increases for a time. By contrast, given access to AI technology which is sufficiently advanced to confer a decisive strategic advantage, a small group of elites might be able to maintain power indefinitely. The more of the work of maintaining control is outsourced to AI, the smaller that group can be; the most extreme case would be permanent global totalitarianism under a single immortal dictator. Worryingly, if there’s no realistic chance of them being overthrown, they could get away with much worse behaviour than most dictators—North Korea is a salient example. Such scenarios seem more likely in a world where progress in AI is rapid, and leads to severe inequality. In particular, economic inequality makes subversion of our political systems easier; and inequality between countries marks it more likely for an authoritarian regime to gain control of the world.

In terms of direct approaches to preventing totalitarianism, I expect it will be most effective to apply existing approaches (e.g. laws against mass surveillance) to new applications powered by AI; but it’s likely that there will also be novel and valuable approaches. Note, finally, that these arguments assume a level of change comparable to the industrial revolution; however, eventually we’ll get far beyond that (e.g. by becoming posthuman). I discuss some of these long-term considerations later on.

A slow-rolling catastrophe

I’ll start this section by introducing the key idea of Paul Christiano’s “slow-rolling catastrophe” (even though it’s not framed directly in terms of competitive pressures; I’ll get to those later): “Right now humans thinking and talking about the future they want to create are a powerful force that is able to steer our trajectory. But over time human reasoning will become weaker and weaker compared to new forms of reasoning honed by trial-and-error. Eventually our society’s trajectory will be determined by powerful optimization with easily-measurable goals rather than by human intentions about the future.”

In what different domains might we see this sort of powerful optimisation? Paul identifies a few:

  • Corporations will focus on “manipulating consumers, capturing regulators, extortion and theft”.

  • “Instead of actually having an impact, [investors] will be surrounded by advisors who manipulate them into thinking they’ve had an impact.

  • Law enforcement will be driven by “creating a false sense of security, hiding information about law enforcement failures, suppressing complaints, and coercing and manipulating citizens.”

  • Legislation may be optimized for “undermining our ability to actually perceive problems and constructing increasingly convincing narratives about where the world is going and what’s important.”

I’m skeptical of this argument for a few reasons. The main one is that these are highly exceptional claims, which demand concomitantly exceptional evidence. So far we have half a blog post by Paul, plus this further investigation by Sam Clarke. Anecdotally it seems that a number of people find this line of argument compelling, but unless they have seen significant non-public evidence or arguments, I cannot see how this is justifiable. (And even then, the lack of public scrutiny should significantly reduce our confidence in these arguments.)

In particular, it seems that such arguments need to overcome two important “default” expectations about the deployment of new technology. The first is that such technologies are often very useful, with their benefits significantly outweighing their harms. So if narrow AI becomes very powerful, we should expect it to improve humanity’s ability to steer our trajectory in many ways. As one example, even though Google search is primarily optimised for easily-measurable metrics, it still gives us much better access to information than we had before it. Some more general ways in which I expect narrow AI to be useful:

  • Search engines and knowledge aggregators will continue to improve our ability to access, filter and organise information.

  • AI-powered collaboration tools such as automated translation and better virtual realities will allow us to coordinate and communicate more effectively.

  • Narrow AI will help us carry out scientific research, by finding patterns in data, automating tedious experimental steps, and solving some computationally-intensive problems (like protein folding).

  • AI tools will help us to analyse our habits and behaviour and make interventions to improve them; they’ll also aggregate this information to allow better high-level decision-making (e.g. by economic policymakers).

  • AI will help scale up the best educational curricula and make them more personalised.

  • Narrow AI will generally make humanity more prosperous, allowing us to dedicate more time and effort to steering society’s trajectory in beneficial directions.

The second default expectation about technology is that, if using it in certain ways is bad for humanity, we will stop people from doing so. This is a less reliable extrapolation—there are plenty of seemingly-harmful applications of technology which are still occurring. But note that we’re talking about a slow-rolling catastrophe—that is, a situation which is unprecedentedly harmful. And so we should expect an unprecedented level of support for preventing whatever is causing it, all else equal.

So what grounds do we have to expect that, despite whatever benefits these systems have, their net effect on our ability to control the future will be catastrophic, in a way that we’re unable to prevent? I’ll break down Paul’s argument into four parts, and challenge each in turn:

  1. The types of problems discussed in the examples above are plausible.

  2. Those problems will “combine with and amplify existing institutional and social dynamics that already favor easily-measured goals”.

  3. “Some states may really put on the brakes, but they will rapidly fall behind economically and militarily.”

  4. “As the system becomes more complex, [recognising and overcoming individual problems] becomes too challenging for human reasoning to solve directly and requires its own trial and error, and at the meta-level the process continues to pursue some easily measured objective (potentially over longer timescales). Eventually large-scale attempts to fix the problem are themselves opposed by the collective optimization of millions of optimizers pursuing simple goals.”

Premise 1

To be clear, I don’t dispute that there are some harmful uses of AI. Development of more destructive weapons is the most obvious; I also think that emotional manipulation is a serious concern, and discuss it in the next section. But many of the things Paul mentions go well beyond basic manipulation—instead they seem to require quite general reasoning and planning capabilities applied over long time periods. This is well beyond what I expect narrow systems trained on easily-measurable metrics in specific domains will be capable of achieving. How do you train on a feedback signal about whether a certain type of extortion will get you arrested a few months later? How do law enforcement AIs prevent surveys and studies from revealing the true extent of law enforcement failures? If it’s via a deliberate plan to suppress them while also overcoming human objections to doing so, then that seems less like a narrow system “optimising for an easily measured objective” and more like an agentic and misaligned AGI. In other words, the real world is so complex and interlinked that in order to optimise for an easily measured objective to an extreme extent, in the face of creative human opposition, AIs will need to take into account a broad range of factors that aren’t easily measured, which we should expect to require reasoning roughly as powerful and general as humans’. (There are a few cases in which this might not apply—I’ll discuss them in the next section.)

Now, we should expect that AIs will eventually be capable of human-level reasoning. But these don’t seem like the types of AIs which Paul is talking about here: he only raises the topic of “systems that have a detailed understanding of the world, which are able to adapt their behavior in order to achieve specific goals” after he’s finished discussing slow-rolling catastrophes. Additionally, the more general our AIs are, the weaker the argument that they will pursue easily-measurable targets, because it’ll be easier to train them to pursue vague targets. For example, human intelligence is general enough that we require very few examples to do well on novel tasks; and we can follow even vague linguistic instructions. So Paul needs to explain why AIs that are capable of optimisation which is flexible enough to be worth worrying about won’t be able to acquire behaviour which is nuanced enough to be beneficial. If his answer is that they’ll learn the wrong goals at first and then deceive us or resist our attempts to correct them, then it seems like we’re back to the standard concerns about misaligned agentic AGIs, rather than a separate concern about optimisation for easily-measurable metrics.

Premise 2

Secondly, let’s talk about existing pressures towards easily-measured goals. I read this as primarily referring to competitive political and economic activity—because competition is a key force pushing people towards tradeoffs which are undesirable in the long term. Perhaps the simplest example is the common critique of capitalism: money is the central example of an easily-measured objective, and under capitalism other values lose ground to the pursuit of it. If corporations don’t make use of AIs which optimise very well for easily-measurable metrics like short-term profits, then they might be driven bankrupt by competitors, or else face censure from their shareholders (who tend to feel less moral responsibility for the behaviour of the corporations they own than other stakeholders do). And so we might think that, even if corporations or institutions have some suspicions about specific AI tools, they’ll just have to use those ones rather than alternatives which optimise less hard for those metrics but are more beneficial for society overall.

What I’m skeptical about, though, is: how strong will these pressures be? Realistically speaking, what will make law enforcement departments hand over enough power to AIs that they can implement behaviour that isn’t desired by those law enforcement departments, such as widespread deception and coercion? (Presumably not standard types of crime, in an era where it’ll be so easy to track and measure behaviour.) Will it be legislative pressure? Will voters really be looking so closely at political activity that politicians have no choice but to pass the laws which are ruthlessly optimised for getting reelected? Right now I’d be very happy if the electorate started scrutinising laws at all! Similarly, I’d be happy if public opinion had any reasonable ability to change the police, or the military, or government bureaucracies. Far from being threatened by over-optimisation, it seems to me that most of our key institutions are near-invulnerable to it. And consider that at this point, we’ll also be seeing great prosperity from the productive uses of AI—what Drexler calls Paretotopia. I expect sufficient abundance that it’ll be easy to overlook massive inefficiencies in all sorts of places, and that the pressure to hand over power to blindly-optimising algorithms will be very low.

Even if there’s great abundance in general, though, and little pressure on most societal institutions, we might worry that individual corporations will be forced to make such tradeoffs in order to survive (or just to make more profit). This seems a little at odds with our current situation, though—the biggest tech companies have a lot of slack (often in the form of several hundreds of billions of dollars of cash reserves). Additionally, it seems fairly easy for governments to punish companies for causing serious harm; even the biggest companies have very little ability to resist punitive legislative measures if politicians and the public swing against them (especially since anti-corporate sentiment is already widespread). The largest sectors in modern economies—including financial systems, healthcare systems, educational systems, and so on—are very heavily regulated to prevent risks much smaller than the ones Paul predicts here.

What might make the economy so much more competitive, that such companies are pushed to take actions which are potentially catastrophic, and risk severe punishment? I think there are plausibly some important domains on which we’ll see races to the bottom (e.g. newsfeed addictiveness), but I don’t think most companies will rely heavily on these domains. Rather, I’d characterise the economy as primarily driven by innovation and exploration rather than optimisation of measurable metrics—and innovation is exactly the type of thing which can’t be measured easily. Perhaps if we ended up in a general technological stagnation, companies would have to squeeze every last drop of efficiency out of their operations, and burn a lot of value in the process. And perhaps if the world stagnates in this way, we’ll have time to collect enough training data that AI-powered “trial and error” will become very effective. But this scenario is very much at odds with the prospect of dramatic innovation in AI, which instead will open up new vistas of growth, and lead to far-reaching changes which render much previous data obsolete. So internal pressure on our societies doesn’t seem likely to cause this slow-rolling catastrophe, absent specific manipulation of certain pressure points (as I’ll discuss in the next section).

Premise 3

Thirdly: what about external pressure, from the desire not to fall behind economically or militarily? I’m pretty uncertain about how strong an incentive this provides—the US, for example, doesn’t seem very concerned right now about falling behind China. And in a world where change is occurring at a rapid and bewildering pace, I’d expect that many will actually want to slow that change down. But to be charitable, let’s consider a scenario of extreme competition, e.g. a Cold War between the US and China. In that case, we might see each side willing to throw away some of their values for a temporary advantage. Yet military advantages will likely rely on the development of a few pivotal technologies, which wouldn’t require large-scale modifications to society overall (although I’ll discuss dangers from arms races in the next section). Or perhaps the combatants will focus on boosting overall economic growth. Yet this would also incentivise increased high-level oversight and intervention into capitalism, which actually helps prevent a lot of the harmful competitive dynamics I discussed previously (e.g. increasingly addictive newsfeeds).

Premise 4

Fourthly: Paul might respond by saying that the economy will by then have become some “incomprehensible system” which we can’t steer, and which “pursues some easily measured objective” at the meta level. But there are also a lot of comprehensibility benefits to a highly automated economy compared with our current economy, in which a lot of the key variables are totally opaque to central planners. Being able to track many more metrics, and analyse them with sophisticated machine learning tools, seems like a big step up. And remember that, despite this, Paul is predicting catastrophic outcomes from a lack of measurability! So the bar for incomprehensibility is very high—not just that we’ll be confused by some parts of the automated economy, but that we won’t have enough insight into it to notice a disaster occurring. This may be possible if there are whole chunks of it which involve humans minimally or not at all. But as I’ve argued already, if AIs are capable and general enough to manage whole chunks of the economy in this way, they’re unlikely to be restricted to pursuing short-term metrics.

Further, I don’t see where the meta-level optimisation for easily measured objectives comes from. Is it based on choices made by political leaders? If only they optimised harder on easily-measured objectives like GDP or atmospheric carbon concentrations! Instead it seems that politicians are very slow to seize opportunities to move towards extreme outcomes. Additionally, we have no good reason to think that “the collective optimization of millions of optimizers pursuing simple goals” will be pushing in a unified direction, rather than mostly cancelling each other out.

Overall I am highly skeptical about the “slow-rolling catastrophe” in the abstract. I think that in general, high-level general-purpose reasoning skills (whether in humans or AIs) will remain much more capable of producing complex results than AI trial and error, even if people don’t try very hard to address problems arising from the latter (we might call this position “optimism of the intellect, pessimism of the will”). I’m more sympathetic to arguments identifying specific aspects of the world which may become increasingly important and increasingly vulnerable; let’s dig into these now.

A vulnerable world

This section is roughly in line with Bostrom’s discussion of the vulnerable world hypothesis, although at the end I also talk about some ways in which new technologies might lead to problematic structural shifts rather than direct vulnerabilities. Note that I discuss some of these only briefly; I’d encourage others to investigate them in greater detail.


It may be the case that human psychology is very vulnerable to manipulation by AIs. This is the type of task on which a lot of data can be captured (because there are many humans who can give detailed feedback); the task is fairly isolated (manipulating one human doesn’t depend much on the rest of the world); and the data doesn’t become obsolete as the world changes (because human psychology is fairly stable). Even assuming that narrow AIs aren’t able to out-argue humans in general, they may nevertheless be very good at emotional manipulation and subtle persuasion, especially against humans who aren’t on their guard. So we might be concerned that some people will train narrow AIs which can be used to manipulate people’s beliefs or attitudes. We can also expect that there will be a spectrum of such technologies: perhaps the most effective will be direct interaction with an AI able to choose an avatar and voice for itself. AIs might also be able to make particularly persuasive films, or ad campaigns. One approach I expect to be less powerful, but perhaps relevant early on, is an AI capable of instructing a human on how to be persuasive to another human.

How might this be harmful to the long-term human trajectory? I see two broad possibilities. The first is large-scale rollouts of weaker versions of these technologies, for example by political campaigns in order to persuade voters, which harms our ability to make good collective decisions; I’ll call this the AI propaganda problem. (This might also be used by corporations to defend themselves from the types of punishments I discussed in the previous section). The second is targeted rollouts of more powerful versions of this technology, for example aimed at specific politicians by special interest groups, which will allow the attackers to persuade or coerce the targets into taking certain actions; I’ll call this the AI mind-hacking problem. I expect that, if mind-hacking is a real problem we will face, then the most direct forms of it will quickly become illegal. But in order to enforce that, detection of it will be necessary. So tools which can distinguish an AI-generated avatar from a video stream of a real human would be useful; but I expect that they will tend to be one step behind the most sophisticated generative tools (as is currently the case for adversarial examples, and cybersecurity). Meanwhile it seems difficult to prevent AIs being trained to manipulate humans by making persuasive videos, because by then I expect AIs to be crucial in almost every step of video production.

However, this doesn’t mean that detection will be impossible. Even if there’s no way to differentiate between a video stream of a real human versus an AI avatar, in order to carry out mind-hacking the AI will need to display some kind of unusual behaviour; at that point it can be flagged and shut down. Such detection tools might also monitor the mental states of potential victims. I expect that there would also be widespread skepticism about mind-hacking at first, until convincing demonstrations help muster the will to defend against them. Eventually, if humans are really vulnerable in this way, I expect protective tools to be as ubiquitous as spam filters—although it’s not clear whether the offense-defense balance will be as favourable to defense as it is in the case of spam. Yet because elites will be the most valuable targets for the most extreme forms of mind-hacking, I expect prompt action against it.

AI propaganda, by contrast, will be less targeted and therefore likely have weaker effects on average than mind-hacking (although if it’s deployed more widely, it may be more impactful overall). I think the main effect here would be to make totalitarian takeovers more likely, because propaganda could provoke strong emotional reactions and political polarisation, and use them to justify extreme actions. It would also be much more difficult to clamp down on than direct mind-hacking; and it’d target an audience which is less informed and less likely to take protective measures than elites.

One closely-related possibility is that of AI-induced addiction. We’re already seeing narrow AI used to make various social media more addictive. However, even if it’s as addictive as heroin, plenty of people manage to avoid using that, because of the widespread knowledge of its addictiveness. Even though certain AI applications are much easier to start using than heroin, I expect similar widespread knowledge to arise, and tools (such as website blockers) to help people avoid addiction. So it seems plausible that AI-driven addiction will be a large public health problem, but not a catastrophic threat.

The last possibility along these lines I’ll discuss is AI-human interactions replacing human-human interactions—for example, if AI friends and partners become more satisfying than human friends and partners. Whether this would actually be a bad outcome is a tricky moral question; but either way, it definitely opens up more powerful attack vectors for other forms of harmful manipulation, such as the ones previously discussed.

Centralised control of important services

It may be the case that our reliance on certain services—e.g. the Internet, the electrical grid, and so on—becomes so great that their failure would cause a global catastrophe. If these services become more centralised—e.g. because it’s efficient to have a single AI system which manages them—then we might worry that a single bug or virus could wreak havoc.

I think this is a fairly predictable problem that normal mechanisms will handle, though, especially given widespread mistrust of AI, and skepticism about its robustness.

Structural risks and destructive capabilities

Zwetsloot and Dafoe have argued that AI may exacerbate (or be exacerbated by) structural problems. The possibility which seems most pressing is AI increasing the likelihood of great power conflict. As they identify, the cybersecurity dilemma is a relevant consideration; and so is the potential insecurity of second-strike capabilities. Novel weapons may also have very different offense-defense balances, or costs of construction; we currently walk a fine line between nuclear weapons being sufficiently easy to build to allow Mutually Assured Destruction, and being sufficiently hard to build to prevent further proliferation. If those weapons are many times more powerful than nuclear weapons, then preventing proliferation becomes correspondingly more important. However, I don’t have much to say right now on this topic, beyond what has already been said.

A digital world

We should expect that we will eventually build AIs which are moral patients, and which are capable of suffering. If these AIs are more economically useful than other AIs, we may end up exploiting them at industrial scales, in a way analogous to factory farming today.

This possibility relies on several confusing premises. First is the question of moral patienthood. It seems intuitive to give moral weight to any AIs that are conscious, but if anything this makes the problem thornier. How can we determine which AIs are conscious? And what does it even mean, in general, for AIs very different from current sentient organisms to experience positive or negative hedonic states? Shulman and Bostrom discuss some general issues in the ethics of digital minds, but largely skim over these most difficult questions.

It’s easier to talk about digital minds which are very similar to human minds—in particular, digital emulations of humans (aka ems). We should expect that ems differ from humans mainly in small ways at first—for example, they will likely feel more happiness and less pain—and then diverge much more later on. Hanson outlines a scenario where ems, for purposes of economic efficiency, are gradually engineered to lack many traits we consider morally valuable in our successors, and then end up dominating the world. Although I’m skeptical about the details of his scenario, it does raise the crucial point that the editability and copyability of ems undermine many of the safeguards which prevent dramatic value drift in our current civilisation.

Even aside from resource constraints, though, other concerns arise in a world containing millions or billions of ems. Because it’s easy to create and delete ems, it will be difficult to enforce human-like legal rights for them, unless the sort of hardware they can run on is closely monitored. But centralised control over hardware comes with other problems—in particular, physical control over hardware allows control over all the ems running on it. And although naturally more robust than biological humans in many ways, ems face other vulnerabilities. For example, once most humans are digital ems, computer viruses will be a much larger (and potentially existential) threat.


Based on this preliminary exploration, I’m leaning towards thinking about risks which might arise from the development of advanced narrow, non-agentic AI primarily in terms of the following four questions:

  1. What makes global totalitarianism more likely?

  2. What makes great power conflict more likely?

  3. What makes misuse of AIs more likely or more harmful?

  4. What vulnerabilities may arise for morally relevant AIs or digital emulations?