I’ll make the point that safety engineering can have discontinuous failure modes. The reason the Challenger collapsed was because some o-ring seals in a booster had gotten too cold before launch, preventing them from sealing off the flow of hot gas to the main engine and blowing up the rocket. The function of these o-rings is pretty binary: either gas is kept in and the rocket works, or it’s let out and the whole thing explodes.
AI research might end up with similar problems. It’s probably true that there is such a thing as good enough alignment, but that doesn’t necessarily imply that progress on solving it can be made incrementally and doesn’t have all or nothing stakes in deployment.
I’ll make the point that safety engineering can have discontinuous failure modes. The reason the Challenger collapsed was because some o-ring seals in a booster had gotten too cold before launch, preventing them from sealing off the flow of hot gas to the main engine and blowing up the rocket. The function of these o-rings is pretty binary: either gas is kept in and the rocket works, or it’s let out and the whole thing explodes.
AI research might end up with similar problems. It’s probably true that there is such a thing as good enough alignment, but that doesn’t necessarily imply that progress on solving it can be made incrementally and doesn’t have all or nothing stakes in deployment.
Might. IABIED requires a discontinuity to be almost certain.