The problem with a naive implementation of RSPs is that we’re trying to build a safety case for a disaster that we fundamentally don’t understand and where we haven’t even produced a single disaster example or simulation.
To be more specific, we don’t know exactly which bundles of AI capabilities and deployments will eventually result in a negative outcome for humans. Worse, we’re not even trying to answer that question—nobody has run an “end of the world simulator” and as far as I am aware there are no plans to do that.
Without such a model it’s very difficult to do expected utility maximization with respect to AGI scaling, deployment, etc.
Safety is a global property, not a local property. We have some surface-level understanding of this from events like The Arab Spring or World War I. Was Europe in 1913 “safe”? Apparently not, but it wasn’t obvious to people.
What will happen if and when someone makes AI systems that are emotionally compelling to people and demand sentient rights for AIs? How do you run a safety eval for that? What are the consequences for humanity if we let AI systems vote in elections, run for office, start companies or run mainstream news orgs and popular social media accounts? What is the endgame of that world, and does it include any humans?
The problem with a naive implementation of RSPs is that we’re trying to build a safety case for a disaster that we fundamentally don’t understand and where we haven’t even produced a single disaster example or simulation.
To be more specific, we don’t know exactly which bundles of AI capabilities and deployments will eventually result in a negative outcome for humans. Worse, we’re not even trying to answer that question—nobody has run an “end of the world simulator” and as far as I am aware there are no plans to do that.
Without such a model it’s very difficult to do expected utility maximization with respect to AGI scaling, deployment, etc.
Safety is a global property, not a local property. We have some surface-level understanding of this from events like The Arab Spring or World War I. Was Europe in 1913 “safe”? Apparently not, but it wasn’t obvious to people.
What will happen if and when someone makes AI systems that are emotionally compelling to people and demand sentient rights for AIs? How do you run a safety eval for that? What are the consequences for humanity if we let AI systems vote in elections, run for office, start companies or run mainstream news orgs and popular social media accounts? What is the endgame of that world, and does it include any humans?