One would need to be very careful to specify how a shield AI would function.
Of course! I don’t pretend that it’s easy; it’s just that it may require, say, 6 people-years of work instead of 400 people-years of work, and thus be a project that actually could be completed before unFriendly AIs start launching.
you seem to want a shield AI which is actually not given a specific method of being turned off.
I mean, you could have an off-switch that expires after the first 2 years or so, maybe based on the decay of a radioactive element in a black box, with the quantity put in the black box put there before the shield AI is turned on and with the exact quantity unknown to all except a very small number of researchers (perhaps one) who does her calculations on pencil and paper and then shreds and eats them. That way you could get a sense of the AI’s actual goals (since it wouldn’t know when it was safe to ‘cheat’) during whatever little time is left before unfriendly AI launches that could take over the off-switch start becoming a serious threat, and, if necessary, abort.
Of course! I don’t pretend that it’s easy; it’s just that it may require, say, 6 people-years of work instead of 400 people-years of work, and thus be a project that actually could be completed before unFriendly AIs start launching.
I mean, you could have an off-switch that expires after the first 2 years or so, maybe based on the decay of a radioactive element in a black box, with the quantity put in the black box put there before the shield AI is turned on and with the exact quantity unknown to all except a very small number of researchers (perhaps one) who does her calculations on pencil and paper and then shreds and eats them. That way you could get a sense of the AI’s actual goals (since it wouldn’t know when it was safe to ‘cheat’) during whatever little time is left before unfriendly AI launches that could take over the off-switch start becoming a serious threat, and, if necessary, abort.