Also, we don’t know what would happen if we exactly optimized an image to maximize the activation of a particular human’s face detection circuitry. I expect that the result would be pretty eldritch as well.
They are still smooth and have low-frequency patterns, which seems to be the main difference from adversarial examples currently produced from DL models.
Yeah. Wake me up when we find a single agent which makes decisions by extremizing its own concept activations. EG I’m pretty sure that people don’t reflectively, most strongly want to make friends with entities which maximally activate their potential-friend detection circuitry.
Also, we don’t know what would happen if we exactly optimized an image to maximize the activation of a particular human’s face detection circuitry. I expect that the result would be pretty eldritch as well.
We may be already doing that in case of cartoon faces with their exaggerated features. Cartoon faces don’t look eldritch to us, but why would they?
They are still smooth and have low-frequency patterns, which seems to be the main difference from adversarial examples currently produced from DL models.
Yeah. Wake me up when we find a single agent which makes decisions by extremizing its own concept activations. EG I’m pretty sure that people don’t reflectively, most strongly want to make friends with entities which maximally activate their potential-friend detection circuitry.