James Payor comments on On A List of Lethalities

James Payor 15 Jun 2022 0:37 UTC
1 point
0
I agree!

I think that in order to achieve this you probably have to do lots of white-box things, like watching the AI’s internal state, attempting to shape the direction of its learning, watching carefully for pitfalls. And I expect that treating the AI more as a black box and focusing on containment isn’t going to be remotely safe enough.