I am by and large convinced by the arguments that a UFAI is incredibly dangerous and no precautions of this sort would really suffice.
However, once a candidate FAI is built and we’re satisfied we’ve done everything we can to minimize the chances of unFriendliness, we would almost certainly use precautions like these when it’s first switched on to mitigate the risk arising from a mistake.
Certainly I’d think Eliezer (or anyone) would have much more trouble with an AI-box game if he had to get one person to convince another to let him out.
Eliezer surely would, but the fact observers being surprised was the point of an AI box experiment.
In short non-technical and not precisely accurate summary, if people can be surprised once when they were very confident and can then add on extra layers and be as confident as they were before one time they can do it forever.
I am by and large convinced by the arguments that a UFAI is incredibly dangerous and no precautions of this sort would really suffice.
However, once a candidate FAI is built and we’re satisfied we’ve done everything we can to minimize the chances of unFriendliness, we would almost certainly use precautions like these when it’s first switched on to mitigate the risk arising from a mistake.
Certainly I’d think Eliezer (or anyone) would have much more trouble with an AI-box game if he had to get one person to convince another to let him out.
Eliezer surely would, but the fact observers being surprised was the point of an AI box experiment.
In short non-technical and not precisely accurate summary, if people can be surprised once when they were very confident and can then add on extra layers and be as confident as they were before one time they can do it forever.