Stopping dangerous AI: Ideal US behavior

Disclaimer: this post doesn’t have the answers. Moreover, it’s an unfinished draft. Hopefully a future version will be valuable, but that will only occur if I revise/​rewrite it. For now you’re better off reading sources linked from AI policy ideas: Reading list and Slowing AI: Reading list.

Set aside most of AI safety to focus on the speed of AI progress. What should the US government do?[1] This post assumes that it’s good to slow (dangerous) AI (especially near the end).

The ultimate goal is to prevent the deployment of powerful AI systems—that is, AI systems that would cause a catastrophe—until we learn how to make them safe. (Pretending alignment is monolithic.) You can do this by making powerful AI systems not developed, not deployed, or not cause a catastrophe if deployed because the world is super-resilient. Focusing on development is most promising.

So, delay the development of powerful AI systems until they’re safe. (Probably that’s too crude. Complications: (1) alignment progress is endogenous; (2) you have to actually pay the alignment tax so leading labs’ lead time and safety-awareness are crucial.)

Lots of AI, like robotics and autonomous vehicles and some medical stuff and image generation and so forth, isn’t very dangerous.

The dream would be that we have a test to determine whether an AI system would cause a catastrophe, and all large training runs are audited, and if a model fails the test then the training run is shut down. But (1) good tests don’t exist yet, and (2) the only known way to make AI safe is to make it not powerful enough to cause a catastrophe. Related: model evals and Yonadav Shavit’s Verifying Rules on Large-Scale Neural Network Training via Compute Monitoring (2023).


Ways governments can slow AI (this is kinda a list of policy levers rather than policy proposals)

  • Ban/​moratorium: monitoring training runs (see Shavit) + ceiling on training compute

    • For a certain time, or until a certain safety-related condition is met, or until the government changes its mind, or a ceiling that automatically rises over time (e.g., 1e26 FLOP in 2023 and doubling every year)

  • Miscellaneous standards & regulation

    • Evals

  • Export controls

    • Aimed at chips and the compute supply chain

    • Aimed at AI model access, software, and ideas

  • Track compute

    • To enable regulating training runs

  • Restrict publication and diffusion of ideas

    • E.g. by creating a mandatory research silo for work on the path to dangerous AI by leading labs

    • The vibe I get is that this is intractable

  • Support infosec

  • Maybe make an antitrust exemption for certain actions by AI labs

    • Some people are concerned that antitrust law would prevent labs from coordinating to slow down, not publish research, or not deploy systems. (Not clear that this is necessary.)

    • Luke says: “Create a narrow antitrust safe harbor for AI safety & security collaboration. Frontier-model developers would be more likely to collaborate usefully on AI safety and security work [and, I add, developing and deploying cautiously] if such collaboration were more clearly allowed under antitrust rules.”

  • Incident reporting + tracking? (Useful for safety, and) goes with compute-tracking as enabling future regulation. Also creating an AI regulatory agency or other ways to build capacity. (Crux: whether more and more-informed governance is net-positive on the margin. I think it’s positive EV but feel confused and unsure how to become unconfused.) Maybe something with standards, like increasing NIST work on AI and probably sophisticated stuff I’m not aware of.

  • Expropriation

    • Expropriation for safety seems unrealistic

  • Help labs stop dangerous AI

An oft-mentioned kind of policy is clarifying or increasing AI developers’ liability for AI harms. I currently feel confused about this. There are negative (and some positive) externalities in AI development, deployment, publication, and (lack of) security. It would be nice to internalize the externalities.


“What about China?” People sometimes assert that slowing down AI progress would be bad or politically infeasible because China wouldn’t slow down, so China would “win” the “race.” This is questionable for several reasons, but the real reason it’s a bad objection to slowing AI is that labs and Western governments have tools to slow down China even more than the West.

  • Publication & export controls on software/​ideas/​etc.

    • Decreasing publication from leading Western labs slows the West and slows China more

    • Siloing research within a group of leading Western labs would slow China without even slowing the West much

  • Export controls on hardware

  • Migration

    • Decrease China’s access to talent (through increasing migration) and/​or access to Western training, knowledge, and connections (through decreasing migration)

  • Security

    • Increase infosec, opsec, and cybersec (good for many reasons!)

  1. ^

    Note that what one should nudge the US government to do on the margin in the real world is not in exactly the same direction as what it should do to act optimally.