We are clearly not in a position to do high reliability AI safety engineering now. But in O(5) years, either we will then be doing that sort of work, or we will have paused AI in order to give us time to figure this out, or shortly after that we’ll be playing Russian Roulette with the existence of our species. So I do hope people in the field realize they have signed up for an area that is going to need to become detailed, painstaking, and meticulous rather quickly. Doing some background reading into how other complex system safety engineering fields do this sort of thing seems usefully (personally I come from a high-reliability distributed systems background: I know how to turn 2-nines into 4-nines or even 5-nines, but I haven’t worked at 6-nines or above).
One issue is that if, say, 99.9% of our ASI personas are aligned (or all are aligned 99.9% of the time), and 0.1% isn’t, then this becomes more like an exercise in building reliable systems from unreliable components, or reliable institutions from unreliable people, or like superintelligent law enforcement work, or superintelligent mental health work — all of which are fields we know something about. We’re not trying to align one ASI CEO, we’re trying to align a whole datacenter full of them: and that gives us access to useful redundancy and fault tolerance approaches that we don’t have for a single one. And of course also to “who watches the watchmen?” problems.
We are clearly not in a position to do high reliability AI safety engineering now. But in O(5) years, either we will then be doing that sort of work, or we will have paused AI in order to give us time to figure this out, or shortly after that we’ll be playing Russian Roulette with the existence of our species. So I do hope people in the field realize they have signed up for an area that is going to need to become detailed, painstaking, and meticulous rather quickly. Doing some background reading into how other complex system safety engineering fields do this sort of thing seems usefully (personally I come from a high-reliability distributed systems background: I know how to turn 2-nines into 4-nines or even 5-nines, but I haven’t worked at 6-nines or above).
One issue is that if, say, 99.9% of our ASI personas are aligned (or all are aligned 99.9% of the time), and 0.1% isn’t, then this becomes more like an exercise in building reliable systems from unreliable components, or reliable institutions from unreliable people, or like superintelligent law enforcement work, or superintelligent mental health work — all of which are fields we know something about. We’re not trying to align one ASI CEO, we’re trying to align a whole datacenter full of them: and that gives us access to useful redundancy and fault tolerance approaches that we don’t have for a single one. And of course also to “who watches the watchmen?” problems.