Not fully understanding things is the default … even non AI software can’t be fully understood if it is complex enough. We already know how to probe systems we don’t understand apriori, through scientific experimentation. You don’t have to get alignment right first time, at least not without the foom/RRSI or incorrigibility assumptions.
The difference with normal software is that at least somebody understands every individual part, and if you collected all those somebodies and locked them in a room for a while they could write up a full explanation. Whereas with AI I think we’re not even like 10% of the way to full understanding.
Also, if you’re trying to align a superintelligence, you do have to get it right on the first try, otherwise it kills you with no counterplay.
Not fully understanding things is the default … even non AI software can’t be fully understood if it is complex enough. We already know how to probe systems we don’t understand apriori, through scientific experimentation. You don’t have to get alignment right first time, at least not without the foom/RRSI or incorrigibility assumptions.
The difference with normal software is that at least somebody understands every individual part, and if you collected all those somebodies and locked them in a room for a while they could write up a full explanation. Whereas with AI I think we’re not even like 10% of the way to full understanding.
Also, if you’re trying to align a superintelligence, you do have to get it right on the first try, otherwise it kills you with no counterplay.
That has not been demonstrated.
( “Gestures towards IABIED”
“Gestures towards critiques thereof”)