Code Generation as an AI risk setting

Historically, it has been difficult to persuade people of the likelihood of AI risk because the examples tend to sound “far-fetched” to audiences not bought in on the premise. One particular problem with many traditional framings for AI takeover is that most people struggle to imagine how e.g. “a robot programmed to bake maximum pies” figures out how to code, locates its own source-code, copies itself elsewhere via an internet connection and then ends the world.

There’s a major logical leap there: “pie-baking” and “coding” are things done by different categories of agent in our society, and so it’s fundamentally odd for people to imagine an agent capable of both. This oddness makes it feel like we must be far away from any system that could be that general, and thus pushes safety concerns to a philosophical exercise.

I want to make the case that the motivating example we should really be using is automatic code generation. Here’s a long list of reasons why:

  • It’s obvious to people why and how a system good at generating code could generate code to copy itself, if it were given an open-ended task. It’s a basic system-reliability precaution that human engineers would also take.

  • For non-experts, they are already afraid of unrestrained hackers and of large tech companies building software products that damage society—this being done by an unaccountable AI fits into an emotional narrative.

  • For software people (whom we most need to convince) the problem of unexpected behaviors from code is extremely intuitive—as is the fact that it is always the case that code bases are too complex for any human to be certain of what they’ll do before they’re run.

  • Code generation does seem to be getting dramatically better, and the memetic/​media environment is ripe for people to decide how to feel about these capabilities.

  • Nearly all conceivable scalable prosaic alignment solutions will require some degree of “program verification”—making sure that code isn’t being run with an accidentally terrible utility function, or to verify the outputs of other AIs via code-checking Tool AIs. So we want substantial overlap between the AI safety and AI codegen communities.

  • The “alignment problem” already exists in nearly all large software engineering projects: it’s very difficult to specify what you want a program to do ahead of time, and so we mostly just run codebases and see what happens.

  • All of the concerns around “the AI learns to use Rowhammer to escape” feel much more obvious when you’re building a code-generator.

  • We can even motivate the problem by having the AI’s objective be “make sure that other code-generating AIs don’t misbehave”. This is open-ended in a way that obviously makes it a utility-maximizer, and preemptively addresses the usual technooptimistic response of “we’ll just build auditor AIs” by starting with aligning those as the premise.

  • The distinction between act-based AIs and EUMs is obvious in the case of code-gen. Similarly, the idea of Safety via Debate is related to code reviewing processes.

  • Software project generation capabilities seem both necessary and possibly sufficient for FOOM/​takeover scenarios.

  • Ultimately, the people in government/​companies most sympathetic to high-tech risk mitigation are the people who think about cybersecurity—so scaring them gets us a very useful ally. (It’s also a community with plenty of people with the “security mindset” needed for many empirical alignment scenarios.)

On the other hand, there may be some risk that focusing on code generation increases its public salience and thus investment in it. But this seems likely to have happened anyway. It’s also more obviously the path towards recursive self-improvement, and thus may accelerate AI capabilities, but again this does already seem to be happening whether or not we discuss it.

What do people think of this as a framing device?