You’ve listed mostly things that countries should do out of self-interest, without much need for international cooperation. (A little bit disanalogous with the UN SDGs.) This is fine, but I think there could also be some useful principles for international regulation of AI that countries could agree to in principle, to pave the way for cooperation even in an atmosphere of competitive rhetoric.
Under Develop Safe AI, it’s possible “Alignment” should be broken down into a few chunks, though I’m not sure. There’s a current paradigm called “alignment” that uses supervised finetuning + reinforcement learning on large models, where new reward functions, and new ways of leveraging human demonstrations/feedback, all have a family resemblance. And then there’s everything else—philosophy of preferences, decision theory for alignment, non-RL alignment of LLMs, neuroscience of human preferences, speculative new architectures that don’t fit in the current paradigm. Labels might just be something like “Alignment via finetuning” vs. “Other alignment.”
Under Societal Resilience to Disruption, I think “Epistemic Security Measures” could be fleshed out more. The first thing that pops to mind is letting people tell whether some content or message is from an AI, and empowering people to filter for human content / messages. (Proposals range from legislation outlawing impersonating a human, to giving humans unique cryptographic identifiers, to something something blockchain something Sam Altman.)
But you might imagine more controversial and dangerous measures—like using your own AI propaganda-bot to try to combat all external AI propaganda-bots, or instituting censorship based on the content of the message and not just whether its sender is human (which could be a political power play, or as mission creep trying to combat non-AI disinformation under the banner of “Epistemic Security,” or because you expect AIs or AI-empowered adversaries to have human-verified accounts spreading their messages). I think the category I’m imagining (which may be different than the category you’re imagining) might benefit from a more specific label like “Security from AI Manipulation.”
I like it.
You’ve listed mostly things that countries should do out of self-interest, without much need for international cooperation. (A little bit disanalogous with the UN SDGs.) This is fine, but I think there could also be some useful principles for international regulation of AI that countries could agree to in principle, to pave the way for cooperation even in an atmosphere of competitive rhetoric.
Under Develop Safe AI, it’s possible “Alignment” should be broken down into a few chunks, though I’m not sure. There’s a current paradigm called “alignment” that uses supervised finetuning + reinforcement learning on large models, where new reward functions, and new ways of leveraging human demonstrations/feedback, all have a family resemblance. And then there’s everything else—philosophy of preferences, decision theory for alignment, non-RL alignment of LLMs, neuroscience of human preferences, speculative new architectures that don’t fit in the current paradigm. Labels might just be something like “Alignment via finetuning” vs. “Other alignment.”
Under Societal Resilience to Disruption, I think “Epistemic Security Measures” could be fleshed out more. The first thing that pops to mind is letting people tell whether some content or message is from an AI, and empowering people to filter for human content / messages. (Proposals range from legislation outlawing impersonating a human, to giving humans unique cryptographic identifiers, to something something blockchain something Sam Altman.)
But you might imagine more controversial and dangerous measures—like using your own AI propaganda-bot to try to combat all external AI propaganda-bots, or instituting censorship based on the content of the message and not just whether its sender is human (which could be a political power play, or as mission creep trying to combat non-AI disinformation under the banner of “Epistemic Security,” or because you expect AIs or AI-empowered adversaries to have human-verified accounts spreading their messages). I think the category I’m imagining (which may be different than the category you’re imagining) might benefit from a more specific label like “Security from AI Manipulation.”