gwern comments on Prompt injection in Google Translate reveals base model behaviors behind task-specific fine-tuning

gwern 9 Feb 2026 1:31 UTC
5 points
0
If it’s related to any public Google family of models, my guess would have been that it’s some sort of T5Gemma, perhaps. Gemini Flash is overpowered and should know what it is, but the Gemmas are ruthlessly optimized and downsized to be as cheap as possible but also maybe ignorant. (They are also probably distilled or otherwise reliant on the Gemini series, so if you think they are Gemini-1.5 or something based solely on output text evidence, that might be spurious.)
- Petropolitan 9 Feb 2026 21:19 UTC
  1 point
  0
  Parent
  Hm-m, an interesting thought I for some reason didn’t consider!
  I don’t think anyone has demonstrated the vulnerability of encoder-decoder LLMs to prompt injection yet, although it doesn’t seem unlikely and I was able to find this paper from July about multimodal models with visual encoders: https://arxiv.org/abs/2507.22304
  Hope some researcher take this topic and test T5Gemmas on prompt injection soon