Slowing AI: Interventions
Disclaimer: this post is underdeveloped and doesn’t have the answers. Hopefully a future version will be valuable, but I’m mostly posting this to help facilitate brainstorms and dialogues in my personal conversations.
What important actions related to slowing AI could actors take; what levers do they have?
I haven’t thought much about this yet. There are some lists and analyses that aren’t focused on slowing AI. Copying from “Actors’ levers” in “Slowing AI: Reading list”:
“Affordances” in “Framing AI strategy” (Stein-Perlman 2023)
Various private collections of AI policy ideas, notably including “AI Policy Ideas Database” (private work in progress)
Current UK government levers on AI development (Hadshar 2023)
Existential and global catastrophic risk policy ideas database (filter for “Artificial intelligence”) (Sepasspour et al. 2022)
Rough short lists of actors’ levers relevant to slowing AI:
US government levers (see also my List of lists of government AI policy ideas):
Ban or moratorium
Standards & regulation
Expropriation for slowing
Liability: affect companies’ liability for misuse of their AI
Migration of talent
Antitrust (or its absence)
Pausing/slowing (on risky systems)
Publication & diffusion of ideas
Industry self-regulation & professional norms
How can you affect those decisions? It depends on who you are.
In addition to reasoning from levers like those listed above, it might be useful to try to generate affordances by starting at goals. Try asking “what goals might actor X be able to achieve” or “how can actor X achieve goal Y” in addition to “how can actor X leverage its ability Z.”
Some ways of slowing AI are independent of relevant actors’ affordances, but many should focus on causing actors to use their abilities well or on giving actors new abilities.
Possible interventions or their properties are listed in the following section. But maybe it can be useful not to think at the level of interventions, but to start at a plan (or playbook or maybe theory of victory) and then find interventions flowing from that plan. For example, the plan develop model evaluations and have leading labs agree to them currently seems promising; we can ask
How can the evals plan help slow AI?
How can we support/facilitate the evals plan? What slowing-related desiderata help enable the evals plan (and what interventions would promote those desiderata)?
What slowing-related interventions does the evals plan enable or synergize with? In worlds where people work on the evals plan, what new opportunities for slowing AI appear?
(Or not just slowing-related interventions. This paragraph applies to strategy in general, not just slowing AI. And most AI plans aren’t focused on slowing AI; that’s fine.)
A big class of AI plans seems to have components (1) do technical AI safety research and (2) slow the deployment of systems that would cause existential catastrophe (especially by slowing their development). Particular existing plans may incorporate observations on slowing-related frames, considerations, variables, affordances, or interventions. Perhaps we can improve and/or steal the slowing-related component of an existing plan.
This is a list of possible interventions, classes of interventions, and characteristics and consequences of interventions related to slowing AI. It’s poorly organized and I don’t analytically endorse it. I also don’t necessarily endorse listed interventions. (One major ambiguity here: interventions by whom? Different actors have different options.)
Help labs slow down for safety (now or later)
Cause labs to want to slow down for safety
Advocacy to labs and the ML research community
Make safety more prestigious
Help labs determine what is risky
Making safety research legible to labs
Help labs coordinate to slow down
Help labs develop and agree to safety standards
Help labs make themselves partially transparent
Make labs’ (and researchers’) attitudes on risk and safety common knowledge
Make labs’ (and researchers’) shared values common knowledge
Katja Grace says: “Formulate specific precautions for AI researchers and labs to take in different well-defined future situations, Asilomar Conference style. These could include more intense vetting by particular parties or methods, modifying experiments, or pausing lines of inquiry entirely. Organize labs to coordinate on these.”
Cause labs to prefer projects and systems that are less risky and less likely to lead to risk
[Not sure how to do that, other than regulation, discussed below]
Help researchers or lab employees develop affordances for collective action (Katja Grace says: “Help organize the researchers who think their work is potentially omnicidal into coordinated action on not doing it”)
Decrease labs’ access to inputs to AI progress
Decrease labs’ access to compute (for large training runs)
Track compute and regulate access to compute for large training runs
Intervene on the supply chain or regulate production
Increase the short-term price of compute by increasing short-term demand
(Maybe) export controls and regulating trade
Decrease labs’ money
Decrease investment in AI labs
Make AI products less profitable (through policy)
Decrease labs’ access to data
Cause companies to not train AI on private data
Motivated by protecting privacy
Motivated by protecting intellectual property
Through companies wanting to respect privacy
Cause companies to not train AI on synthetic data
Cause companies to not train AI on certain parts of the internet
Decrease labs’ access to capability-increasing external research, or decrease diffusion of ideas, algorithms, and models (related desiderata: decrease labs’ access to risk-increasing external research)
Cause capability-increasing ideas to not be published or otherwise propagated
Decrease labs’ access to research talent
Cause researchers to less prefer to do work that increases risk (e.g., perhaps large language models, reinforcement learning agents, and compute-intensive models vs self-driving cars and image generation) (somewhat related: prestige races)
Cause researchers to less prefer to work at leading labs
Tell your friends not to advance risky AI capabilities
Avoid actions that cause people to pursue careers in which they advance risky AI capabilities (including some AI safety research training programs)
Improve migration of AI talent
Increase emigration from China
Decrease labs’ access to AI research tools or ability to automate research
Make it harder to deploy AI research tools
Policy hinders developing, deploying, and profiting from AI
Policy differentially hinders developing, deploying, and profiting from AI that is relatively risky or likely to lead to risky systems (e.g., large language models) (so labs substitute from more-risky to less-risky projects)
Regulation & standards
Katja Grace says: “Try to get the message to the world that AI is heading toward being seriously endangering. If AI progress is broadly condemned, this will trickle into myriad decisions: job choices, lab policies, national laws. To do this, for instance produce compelling demos of risk, agitate for stigmatization of risky actions, write science fiction illustrating the problems broadly and evocatively (I think this has actually been helpful repeatedly in the past), go on TV, write opinion pieces, help organize and empower the people who are already concerned, etc.”
Perhaps some interventions or affordances will become possible in the future. Perhaps in particular some will become possible in the endgame. In particular, I’m tentatively excited about last-minute coordination to slow down for safety, enabled by something like scary demos, relevant actors’ attitudes, craziness in the world, and strategic clarity.
Some important actions to slow AI progress don’t fit into the interventions list, like avoid speeding up AI progress, help others slow AI progress, discover new interventions or affordances for relevant actors, and cause [people/organizations/communities/memes] that will slow AI to gain [influence/resources/power].
Some interventions that seem both unpromising and norm-violating are omitted.
(Bay Area) AI safety people often seem to assume that government interventions require persuading politicians, but in fact technocrats could largely suffice.
- Thomas Larsen et al.’s Ways to buy time (2022)
- Katja Grace’s “Restraint is not terrorism, usually” in “Let’s think about slowing down AI” (2022).
Thomas Larsen et al. have lots of specific possibilities: Ways to buy time. A side benefit of helping labs determine what is risky is that they will do better safety research.
It’s not obvious why it might be that a lab would slow down if and only if that would cause others to slow down. This relates to my desire for a great account of “racing.”
See e.g. Paul Christiano’s Honest organizations (2018). Why is it good for labs to be able to make themselves partially transparent? Transparency allows actors to coordinate without needing to trust each other. In some scenarios, labs would behave more safely if they knew that others were behaving in certain ways, and moreover they could make deals with others to behave more safely.
In addition to AI killing everyone, security-flavored risks include hacking, chemical and biological engineering, and advantaging adversaries. One frame on publication practices is that American labs publishing helps China (it helps everyone but China is #2 in AI) and “they” are taking value from “us.” Perhaps this frame converges with the AI killing everyone risk in suggesting that American AI capabilities research should be siloed or illegal to share or reviewed by the state (‘born secret’) or something.
Katja Grace says:
E.g. a journal verifies research results and releases the fact of their publication without any details, maintains records of research priority for later release, and distributes funding for participation. (This is how Szilárd and co. arranged the mitigation of 1940s nuclear research helping Germany, except I’m not sure if the compensatory funding idea was used.)
And (if I recall correctly) Daniel Kokotajlo, Siméon Campos, and Akash Wasil also have thoughts on publication mechanisms.
“Endgame” in my usage roughly means when there is sufficient clarity and simplicity that
- Possibilities are few/simple/clear enough to be considered pretty exhaustively and
- It’s useful to optimize directly for terminal goals and do direct search, rather than use heuristics and intermediate goals.
Note that there may not be an endgame. See also AI endgame.