I don’t think security mindset means “look for flaws.” That’s ordinary paranoia. Security mindset is something closer to “you better have a really good reason to believe that there aren’t any flaws whatsoever.” My model is something like “A hard part of developing an alignment plan is figuring out how to ensure there aren’t any flaws, and coming up with flawed clever schemes isn’t very useful for that. Once we know how to make robust systems, it’ll be more clear to us whether we should go for melting GPUs or simulating researchers or whatnot.”
That said, I have a lot of respect for the idea that coming up with clever schemes is potentially more dignified than shooting everything down, even if clever schemes are unlikely to help much. I respect carado a lot for doing the brainstorming.
I think a better way of rephrasing it is “clever schemes have too many moving parts and make too many assumptions and each assumption we make is a potential weakness an intelligent adversary can and will optimize for”.
i would love a world-saving-plan that isn’t “a clever scheme” with “many moving parts” but alas i don’t expect it’s what we get. as clever schemes with many moving parts go, this one seems not particularly complex compared to other things i’ve heard of.
I don’t think security mindset means “look for flaws.” That’s ordinary paranoia. Security mindset is something closer to “you better have a really good reason to believe that there aren’t any flaws whatsoever.” My model is something like “A hard part of developing an alignment plan is figuring out how to ensure there aren’t any flaws, and coming up with flawed clever schemes isn’t very useful for that. Once we know how to make robust systems, it’ll be more clear to us whether we should go for melting GPUs or simulating researchers or whatnot.”
That said, I have a lot of respect for the idea that coming up with clever schemes is potentially more dignified than shooting everything down, even if clever schemes are unlikely to help much. I respect carado a lot for doing the brainstorming.
I think a better way of rephrasing it is “clever schemes have too many moving parts and make too many assumptions and each assumption we make is a potential weakness an intelligent adversary can and will optimize for”.
i would love a world-saving-plan that isn’t “a clever scheme” with “many moving parts” but alas i don’t expect it’s what we get. as clever schemes with many moving parts go, this one seems not particularly complex compared to other things i’ve heard of.