Reframing the AI Risk

Follow-up to: Reshaping the AI Industry: Straightforward Appeals to Insiders

Introduction

The central issue with convincing people of the AI Risk is that the arguments for it are not respectable. In the public consciousness, the well’s been poisoned by media, which relegated AGI to the domain of science fiction. In the technical circles, the AI Winter is to blame — there’s a stigma against expecting AGI in the short term, because the field’s been burned in the past.

As such, being seen taking the AI Risk seriously is bad for your status. It wouldn’t advance your career, it wouldn’t receive popular support or peer support, it wouldn’t get you funding or an in with powerful entities. It would waste your time, if not mark you as a weirdo.

The problem, I would argue, lies only partly in the meat of the argument. Certainly, the very act of curtailing the AI capabilities research would step on some organizations’ toes, and mess with people’s careers. Some of the resistance is undoubtedly motivated by these considerations.

It’s not, however, the whole story. If it were, we could’ve expected widespread public support, and political support from institutions which would be hurt by AI proliferation.

A large part of the problem lies in the framing of the arguments. The specific concept of AGI and risks thereof is politically poisonous, parsed as fictional nonsense or a social faux pas. And yet this is exactly what we reach for when arguing our cause. We talk about superintelligent entities worming their way out of boxes, make analogies to human superiority over animals and our escape from evolutionary pressures, extrapolate to a new digital species waging war on humanity.

That sort of talk is not popular with anyone. The very shape it takes, the social signals it sends, dooms it to failure.

Can we talk about something else instead? Can we reframe our arguments?

The Power of Framing

Humanity has developed a rich suite of conceptual frameworks to talk about the natural world. We can view it through the lens of economy, of physics, of morality, of art. We can empathize certain aspects of it while abstracting others away. We can take a single set of facts, and spin innumerable different stories out of them, without even omitting or embellishing any of them — simply by playing with emphases.

The same ground-truth reality can be comprehensively described in many different ways, simply by applying different conceptual frameworks. If humans were ideal reasoners, the choice of framework or narrative wouldn’t matter — we would extract the ground-truth facts from the semantics, and reach the conclusion were always going to reach.

We are not, however, ideal reasoners. What spin we give to the facts matters.

The classical example goes as follows:

Participants were asked to choose between two treatments for 600 people affected by a deadly disease. Treatment A was predicted to result in 400 deaths, whereas treatment B had a 33% chance that no one would die but a 66% chance that everyone would die. This choice was then presented to participants either with positive framing, i.e. how many people would live, or with negative framing, i.e. how many people would die.
Framing Treatment A Treatment B
Positive ”Saves 200 lives” “A 33% chance of saving all 600 people, 66% possibility of saving no one.”
Negative ”400 people will die” ”A 33% chance that no people will die, 66% probability that all 600 will die.”
Treatment A was chosen by 72% of participants when it was presented with positive framing (“saves 200 lives”) dropping to 22% when the same choice was presented with negative framing (“400 people will die”).

As another example, we can imagine two descriptions of an island — one that waxes rhapsodic on its picturesque landscapes, and one that dryly lists the island’s contents in terms of their industrial uses. One would imagine that reading one or the other would have different effects on the reader’s desire to harvest that island, even if both descriptions communicated the exact same set of facts.

More salient examples exist in the worlds of journalism and politics — these industries have developed advanced tools for telling any story in a way that advances the speaker’s agenda.

Fundamentally, language matters. The way you speak, the conceptual handles you use, the facts you empathize and the story you tell, have social connotations that go beyond the literal truths of your statements.

And the AGI frame is, bluntly, a bad one. To those outside our circles, to anyone not feeling charitable, it communicates detachment from reality, fantastical thinking, overhyping, low status.

On top of that, framing has disproportionate effects on people with domain knowledge. Trying to convince a professional of something while using a bad frame is a twice-doomed endeavor.

What Frame Do We Want?

[Successful policies] allow people to continue to pretend to be trying to get the thing they want to pretend to want while actually getting more other things they actually want even if they can deny it. — Robin Hanson

We don’t have to use the AGI frame, I would argue. If the problem is with specific terms, such as “intelligence” and “AGI”, we can start by tabooing them and other “agenty” terms, then seeing what convincing arguments we can come up with under these restrictions.

More broadly, we can repackage our arguments using a different conceptual framework — the way a poetic description of an island could be translated into utilitarian terms to advance the cause of resource-extraction. We simply have to look for a suitable one. (I’ll describe a concrete approach I consider promising in the next section.)

What we need is a frame of argumentation that is, at once:

Robust. It isn’t a lie or mischaracterization, and wouldn’t fall apart under minimal scrutiny. It is, fundamentally, a valid way to discuss what we’re currently calling “the AI Risk”.
Respectable. Being seen acting on it doesn’t cost people social points, and indeed, grants them social points. (Alternatively, not acting on it once it’s been made common knowledge costs social points.)
Safety-promoting. It causes people/companies to act in ways that reduce the AI Risk.

Also, as Rob notes:

Info about AGI propagates too slowly through the field, because when one ML person updates, they usually don’t loudly share their update with all their peers. [...] On a gut level, they see that they have no institutional home and no super-widely-shared ‘this is a virtuous and respectable way to do science’ narrative.

By implication, there’s a fair number of AI researchers who are “sold” on the AI Risk, but who can’t publicly act on that belief because it’d have personal costs they’re not willing to pay. Finding a frame that would be beneficial to be seen supporting would flip that dynamic: it would allow them to rally behind it, solve the coordination problem.

Potential Candidate

(I suggest taking the time to think about the problem on your own, before I potentially bias you.)

It seems that any effective framing would need to talk about AI systems as about volitionless mechanisms, not agents. From that, a framework naturally offers itself: software products and integrity thereof.

It’s certainly a valid way to look at the problem. AI models are software, and they’re used for the same tasks mundane software is. More parallels:

Modern large software is often an incomprehensible mess of code, and we barely understand how it works — much like ML models.
This incomprehensibility gives rise to wide varieties of bugs and unintended behaviors, and their severity and potential for catastrophic failures scales with the complexity of the application.
Poorly-audited software contains a lot of security vulnerabilities and instabilities. AI, as well.
Much like security, Alignment Won’t Happen By Accident.
Do What I Mean is the equivalent of the AI control problem: how can we tell the program what we really want, instead of what we technically programmed it to do?

Most people would agree that putting a program that was never code-audited and couldn’t be bug-fixed in charge of critical infrastructure is madness. That, at least, should be a “respectable” way to argue for the importance of interpretability research, and the foolishness of putting ML systems in control of anything important.

Mind, “respectable” doesn’t mean “popular” — software security/reliability isn’t exactly most companies’ or users’ top priority. But it’s certainly viewed with more respect than the AI Risk. If we argued that integrity is especially important with regards to this particular software industry, we might get somewhere.

It wouldn’t be smooth sailing, even then. We’d need to continuously argue that fixing “bugs” only after a failure has occurred “in the wild” is lethally irresponsible, and there would always be people trying to lower the standards for interpretability. But that should be relatively straightforward to oppose.

This much success would already be good. It would motivate companies that plan to use AI commercially to invest in interpretability, and make interpretability-focused research & careers more prestigious.

It wouldn’t decisively address the real issue, however: AI labs conducting in-house experiments with large ML models. Some non-trivial work would need to be done to expand the frame — perhaps developing a suite of arguments where sufficiently powerful “glitches” could “spill over” in the environment. Making allusions to nuclear power and pollution, and borrowing some language from these subjects, might be a good way to start on that.

There would be some difficulties in talking about concrete scenarios, since they often involve AI models acting in unmistakably intelligent ways. But, for example, Paul Christiano’s story would work with minimal adjustments, since the main “vehicle of agency” there is human economy.

To further ameliorate this problem, we can also imagine rolling out our arguments in stages. First, we may popularize the straightforward “AI as software” case that argues for interpretability and control of deployed models, as above. Then, once the language we use has been accepted as respectable and we’ve expanded the Overton Window such, we may extrapolate, and discuss concrete examples that involve AI models exhibiting agenty behaviors. If we have sufficient momentum, they should be accepted as natural extensions of established arguments, instead of instinctively dismissed.

Framing	Treatment A	Treatment B
Positive	”Saves 200 lives”	“A 33% chance of saving all 600 people, 66% possibility of saving no one.”
Negative	”400 people will die”	”A 33% chance that no people will die, 66% probability that all 600 will die.”