Very interesting! Reminds me of this paper : https://arxiv.org/abs/2310.03693
If these models are unavoidably fragile, we may focus more on model guardrails or external safety mechanisms instead.
Or we need to keep re-aligning models? I’m not sure how much guardrails or control will scale to AGI and beyond.
I wonder if there are technological shields that can be developed—using intelligence—to protect / shield privacy. Similar to VPNs. Like suggest parts of my face I could cover to hide my heart rate or insert noise into my call or browsing history.