Stephen Elliott comments on Subliminal Learning, the Lottery-Ticket Hypothesis, and Mode Connectivity

Stephen Elliott 9 Oct 2025 11:06 UTC
1 point
0
It is possible adversarial image examples would appear innocuous to the human eye, even while having a strong effect on the model.
If so, I think any hope of human review stopping this sort of thing is gone, for we cannot hope to enforce image forensics on every public surface.
However, I am not sure whether adversarial examples can be so invisible in real-world setting without the signal getting smothered by sensor noise. Then an attacker would need adversarial examples robust to sensor noise.