Luke Bailey

Karma: 96

Stanford PhD Student

Luke Bailey 22 Sep 2023 1:07 UTC
1 point
0
in reply to: Tao Lin’s comment on: Image Hijacks: Adversarial Images can Control Generative Models at Runtime
I think this is an interesting point. We are actually conducting some follow-up work seeing how robust our attacks are to various additional “defensive” perturbations (e.g. downscaling, adding noise). As Matt notes, when doing these experiments it is important to see how such perturbations also affect the models general vision language modeling performance. My prior right now is that using this technique it may be possible to defend against the L infinity constrained images, but possibly not the moving patch attacks that showed higher level features. In general adversarial attacks are a cat and mouse game, so I expect that if we can show you can defend using techniques like this, a new training scheme will come along that is able to make adversaries that are robust to such defenses. It is worth noting also that most VLMs only accept small low resolution images already. For example LLaVA (with llama 13b), which is state of the art for open source, only accepts ~200 * 200 pixel sized image, so the above example is not necessarily a fair one.