First, I think safety cases are pretty clearly the only sane way to deploy AI, and it’s great that you’re talking about them. Now on to the discussion of how they ought to go.
I most certainly do not prefer option 2! People frequently assure others “I’ve thought of everything!” then later say “oh well it’s totally reasonable that I missed this thing that seemed totally irrelevant but turned out to be a huge deal...” Like The Moonrise triggering the early warning system and nearly getting everyone nuked.
This is particularly true when you’re dealing with a vastly complex system (an AGI mind) that is trained not built. And far more important once that mind is smarter than you, so it should be expected to think of things, and probably in ways, you haven’t.
In the case of potentially-takeover-capable AGI (maybe not what you meant by GPT6), I’d far prefer Option 1 if the analogy holds; it would give me above 99% chance of survival if the “flight” was the same type as the test cases. The abstract arguments that “we’ve thought of everything” probably wouldn’t get me to that high level of certainty.
But even for airplane safety, I’d prefer a combination of options 1 and 2, empirical and theoretical approaches to safety. And I think everyone else should, and mostly does, too.
On the empirical side, I’d like them to say “we flew it 100 times, and in ten of those, we flew it right into weather we’d never attempt with passengers. We shined lasers at it and flew it through a flock of birds; we flew it into some pretty big drones.”
I want the theoretical claim that we’ve thought of everything that could go wrong to be applied in testing. I want it to be exposed to public scrutiny to see if anyone else can think of anything that could go wrong. I’d like a really thorough multifaceted safety case before we do anything that even might be the equivalent of loading all of humanity into an experimental plane.
I don’t think we’ll get that, but we should be clear that that’s what a wise and selfless world would do, and we should get as close as possible in this world.
One big outstanding question is how we could do empirical research taht would really be anything like the equivalent of Option 1. We can’t quite get there, but we should think about getting as close as possible.
Hi Seth, Thanks for the comment! I agree that empirical evidence is very important for both airplane and AI Safety. IMO a strong case for Option 2 also has to include empirical tests as described in Option 1. I could’ve written that better and will edit thx.
To uncover weird failure modes (eg like Moonrise edge cases), it’s essential to have a well-resourced and capable reviewer team that is incentivised to think of such edge cases. However, this would probably still leave some edge cases that nobody ever thought of. I think such black swan events are just hard to anticipate for Risk Management esp when one doesn’t have a lot of historical evidence about similar systems.
First, I think safety cases are pretty clearly the only sane way to deploy AI, and it’s great that you’re talking about them. Now on to the discussion of how they ought to go.
I most certainly do not prefer option 2! People frequently assure others “I’ve thought of everything!” then later say “oh well it’s totally reasonable that I missed this thing that seemed totally irrelevant but turned out to be a huge deal...” Like The Moonrise triggering the early warning system and nearly getting everyone nuked.
This is particularly true when you’re dealing with a vastly complex system (an AGI mind) that is trained not built. And far more important once that mind is smarter than you, so it should be expected to think of things, and probably in ways, you haven’t.
In the case of potentially-takeover-capable AGI (maybe not what you meant by GPT6), I’d far prefer Option 1 if the analogy holds; it would give me above 99% chance of survival if the “flight” was the same type as the test cases. The abstract arguments that “we’ve thought of everything” probably wouldn’t get me to that high level of certainty.
But even for airplane safety, I’d prefer a combination of options 1 and 2, empirical and theoretical approaches to safety. And I think everyone else should, and mostly does, too.
On the empirical side, I’d like them to say “we flew it 100 times, and in ten of those, we flew it right into weather we’d never attempt with passengers. We shined lasers at it and flew it through a flock of birds; we flew it into some pretty big drones.”
I want the theoretical claim that we’ve thought of everything that could go wrong to be applied in testing. I want it to be exposed to public scrutiny to see if anyone else can think of anything that could go wrong. I’d like a really thorough multifaceted safety case before we do anything that even might be the equivalent of loading all of humanity into an experimental plane.
I don’t think we’ll get that, but we should be clear that that’s what a wise and selfless world would do, and we should get as close as possible in this world.
One big outstanding question is how we could do empirical research taht would really be anything like the equivalent of Option 1. We can’t quite get there, but we should think about getting as close as possible.
Hi Seth, Thanks for the comment! I agree that empirical evidence is very important for both airplane and AI Safety. IMO a strong case for Option 2 also has to include empirical tests as described in Option 1. I could’ve written that better and will edit thx.
To uncover weird failure modes (eg like Moonrise edge cases), it’s essential to have a well-resourced and capable reviewer team that is incentivised to think of such edge cases. However, this would probably still leave some edge cases that nobody ever thought of. I think such black swan events are just hard to anticipate for Risk Management esp when one doesn’t have a lot of historical evidence about similar systems.