Richard_Ngo comments on Richard Ngo’s Shortform

Richard_Ngo 9 Jun 2025 21:38 UTC
29 points
2
Here is a broad sketch of how I’d like AI governance to go. I’ve written this in the form of a “plan” but it’s not really a sequential plan, more like a list of the most important things to promote.
1. Identify mechanisms by which the US government could exert control over the most advanced AI systems without strongly concentrating power. For example, how could the government embed observers within major AI labs who report to a central regulatory organization, without that regulatory organization having strong incentives and ability to use their power against their political opponents?
  1. In practice I expect this will involve empowering US elected officials (e.g. via greater transparency) to monitor and object to misbehavior by the executive branch.
2. Create common knowledge between the US and China that the development of increasingly powerful AI will magnify their own internal conflicts (and empower rogue states) disproportionately more than it empowers them against each other. So instead of a race to world domination, in practice they will face a “race to stay standing”.
  1. Rogue states will be empowered because human lives will be increasingly fragile in the face of AI-designed WMDs. This means that rogue states will be able to threaten superpowers with “mutually assured genocide” (though I’m a little wary of spreading this as a meme, and need to think more about ways to make it less self-fulfilling).
3. Set up channels for flexible, high-bandwidth cooperation between AI regulators in China and the US (including the “AI regulators” in each who try to enforce good behavior from the rest of the world).
4. Advocate for an ideology roughly like the one I sketched out here, as a consensus alignment target for AGIs.
This is of course all very vague; I’m hoping to flesh it out much more over the coming months, and would welcome thoughts and feedback. Having said that, I’m spending relatively little of my time on this (and focusing on technical alignment work instead).
What links here?
- Richard_Ngo's comment on Richard Ngo’s Shortform by Richard_Ngo (9 Jun 2025 21:07 UTC; 20 points)
- ryan_greenblatt 9 Jun 2025 23:02 UTC
  6 points
  2
  Parent
  On (4), I don’t I understand why having a scale-free theory of intelligent agency would substantially help with making an alignment target. (Or why this is even that related. How things tend to be doesn’t necessarily make them a good target.)
  - Richard_Ngo 10 Jun 2025 5:31 UTC
    4 points
    0
    Parent
    Ooops, good catch. It should have linked to this: https://www.lesswrong.com/posts/FuGfR3jL3sw6r8kB4/richard-ngo-s-shortform?commentId=W9N9tTbYSBzM9FvWh (and I’ve changed the link now).