I really appreciate this comment because safety in cryptography (and computer security in general) is probably the closest analog to safety in AI that I can think of. Cryptographers can only prevent against the known attacks while hoping that adding a few more rounds to a cipher will also prevent against the next few attacks that are developed. Physical attacks are often just as dangerous as theoretical attacks. When a cryptographic primitive is broken it’s game over; there’s no arguing with the machine or with the attackers or papering a solution over the problem. When the keys are exposed, it’s game over. You don’t get second chances.
So far I haven’t seen an analysis of the hardware aspect of FAI on this site. It isn’t sufficient for FAI to have a logical self-reflective model of itself and its goals. It also needs an accurate physical model of itself and how that physical nature implements its algorithms and goals. It’s no good if an FAI discovers that by aiming a suitably powerful source of radiation at a piece of non-human hardware in the real world it is able to instantly maximize its utility function. It’s no good if a bit flip in its RAM makes it start maximizing paperclips instead of CEV. Even if we had a formally proven model of FAI that we were convinced would work I think we’d be fools to actually start running it on the commodity hardware we have today. I think it’s probably a simpler engineering problem to ensure that the hardware is more reliable than the software, but something going seriously wrong in the hardware over the lifetime of the FAI would be an existential risk once it’s running.
I really appreciate this comment because safety in cryptography (and computer security in general) is probably the closest analog to safety in AI that I can think of. Cryptographers can only prevent against the known attacks while hoping that adding a few more rounds to a cipher will also prevent against the next few attacks that are developed. Physical attacks are often just as dangerous as theoretical attacks. When a cryptographic primitive is broken it’s game over; there’s no arguing with the machine or with the attackers or papering a solution over the problem. When the keys are exposed, it’s game over. You don’t get second chances.
So far I haven’t seen an analysis of the hardware aspect of FAI on this site. It isn’t sufficient for FAI to have a logical self-reflective model of itself and its goals. It also needs an accurate physical model of itself and how that physical nature implements its algorithms and goals. It’s no good if an FAI discovers that by aiming a suitably powerful source of radiation at a piece of non-human hardware in the real world it is able to instantly maximize its utility function. It’s no good if a bit flip in its RAM makes it start maximizing paperclips instead of CEV. Even if we had a formally proven model of FAI that we were convinced would work I think we’d be fools to actually start running it on the commodity hardware we have today. I think it’s probably a simpler engineering problem to ensure that the hardware is more reliable than the software, but something going seriously wrong in the hardware over the lifetime of the FAI would be an existential risk once it’s running.