I think that in order to achieve this you probably have to do lots of white-box things, like watching the AI’s internal state, attempting to shape the direction of its learning, watching carefully for pitfalls. And I expect that treating the AI more as a black box and focusing on containment isn’t going to be remotely safe enough.
I agree!
I think that in order to achieve this you probably have to do lots of white-box things, like watching the AI’s internal state, attempting to shape the direction of its learning, watching carefully for pitfalls. And I expect that treating the AI more as a black box and focusing on containment isn’t going to be remotely safe enough.