Suppose you think we need some coordinated action, e.g. pausing deployment for 6 months. For each action, there will be many “minimal coalitions” — sets of decision-makers where, if all agree, the pause holds, but if you remove any one, it doesn’t.
For example, the minimal coalitions for a 6-month pause might include:
Project proposal: Maintain a list of decision-makers who appear in these coalitions, ranked by importance.[1] For each, compile a memo from public and private statements and other inside-baseball information:
What have they said about AI risk?
What incentives do they face?
What kinds of people do they trust?
Who are their allies and rivals?
What’s the best way to approach them?
How would they update under different evidence, e.g. an AI attempting to self-exfiltrate?
The reason to do this: If a lab discovers something bad and needs to push for a coordinated pause, the people involved are specific people with specific beliefs. The document that the lab leadership will reach for isn’t the one titled “Towards a Framework for Coordinated AI Risk Management” — it’s the one titled “What would persuade Liang Wenfeng to agree to a 6 month pause.”
I don’t know if people are working on this — presumably if they are it’s not public — but it’s something I’m keen for policy people work on.
If you like, we can operationalize how important each decision-maker is with Shapley values. Define V(S) as the expected value of the best plan achievable if the people in S are on your side. The Shapley value is the average marginal contribution when a player joins, averaged over all possible orderings of players joining one at a time.
Memos for Minimal Coalitions
Suppose you think we need some coordinated action, e.g. pausing deployment for 6 months. For each action, there will be many “minimal coalitions” — sets of decision-makers where, if all agree, the pause holds, but if you remove any one, it doesn’t.
For example, the minimal coalitions for a 6-month pause might include:
{US President, General Secretary of the CCP}
{CEOs of labs within 6 months of the frontier}
Project proposal: Maintain a list of decision-makers who appear in these coalitions, ranked by importance.[1] For each, compile a memo from public and private statements and other inside-baseball information:
What have they said about AI risk?
What incentives do they face?
What kinds of people do they trust?
Who are their allies and rivals?
What’s the best way to approach them?
How would they update under different evidence, e.g. an AI attempting to self-exfiltrate?
The reason to do this: If a lab discovers something bad and needs to push for a coordinated pause, the people involved are specific people with specific beliefs. The document that the lab leadership will reach for isn’t the one titled “Towards a Framework for Coordinated AI Risk Management” — it’s the one titled “What would persuade Liang Wenfeng to agree to a 6 month pause.”
I don’t know if people are working on this — presumably if they are it’s not public — but it’s something I’m keen for policy people work on.
If you like, we can operationalize how important each decision-maker is with Shapley values. Define V(S) as the expected value of the best plan achievable if the people in S are on your side. The Shapley value is the average marginal contribution when a player joins, averaged over all possible orderings of players joining one at a time.