I’m still curious for your view on the crypto examples you cited. My current understanding is that people do not expect the security proofs to rule out all possible attacks (a situation I can sympathize with since I’ve written multiple proofs that rule out large classes of attacks without attempting to cover all possible attacks), so I’m interested in whether (i) you disagree with that and believe that serious onlookers have had the expectation that proofs are comprehensive, (ii) you agree but feel it would be impractical to give a correct proof and this is a testament to the difficulty of proving things, (iii) you feel it would be possible but prohibitively expensive, and are expressing a quantitative point about the cost of alignment analyses being impractical, (iv) you feel that the crypto case would be practical but the AI case is likely to be much harder and just want to make a directionally analogous update.
I still feel like more of the action is in my skepticism about the (alignment analysis) <--> (security analysis) analogy, but I could still get some update out of the analogy if the crypto situation is thornier than I currently believe.
I’m still curious for your view on the crypto examples you cited. My current understanding is that people do not expect the security proofs to rule out all possible attacks (a situation I can sympathize with since I’ve written multiple proofs that rule out large classes of attacks without attempting to cover all possible attacks), so I’m interested in whether (i) you disagree with that and believe that serious onlookers have had the expectation that proofs are comprehensive, (ii) you agree but feel it would be impractical to give a correct proof and this is a testament to the difficulty of proving things, (iii) you feel it would be possible but prohibitively expensive, and are expressing a quantitative point about the cost of alignment analyses being impractical, (iv) you feel that the crypto case would be practical but the AI case is likely to be much harder and just want to make a directionally analogous update.
I still feel like more of the action is in my skepticism about the (alignment analysis) <--> (security analysis) analogy, but I could still get some update out of the analogy if the crypto situation is thornier than I currently believe.