Thane Ruthenis comments on Sum-threshold attacks

Thane Ruthenis 9 Sep 2023 15:57 UTC
7 points
−3
Another important example: Steganography, hiding messages in plain sight by slightly perturbing the data flowing around in a way only your recipient could understand. Subtle choice of words or phrasings, precisely-placed smudges on a piece of paper…
I think depending on the encoding chosen, steganography is either a central example of a sum-threshold attack (if each individual perturbation makes subtle sense by itself, like saying “this meal is mind-blowing” instead of “amazing” to signal that it’s time to detonate the bomb), or just “in the spirit of” a sum-threshold attack (if each perturbation just signals 0 or 1, is not context-aware, and the full ciphertext is cryptographically encrypted on top of the steganography).
In relation to AI, we have steganography as a convergent LLM alignment challenge and as a potential way to “watermark” LLM outputs.
- Viliam 10 Sep 2023 13:33 UTC
  5 points
  3
  Parent
  As a sci-fi plot, we could imagine an AI that models people sufficiently well that it can manipulate them by replacing words in a text by their synonyms. Predict the probability of outcome X if the person reads the text with Synonym1, predict the probability of X if the person reads the text with Synonym2, choose the one with higher probability; now do this for all words in the text that have synonyms in a dictionary.
  (I do not expect this to work in reality. First, there is too much noise in the environment to predict such microscopic changes. Second, even if you magically could, the effect of the entire text is unlikely to be a linear combination of effects of individual words.)
  - TsviBT 13 Sep 2023 12:29 UTC
    4 points
    0
    Parent
    This reminds me of these two Derren Brown videos: https://www.youtube.com/watch?v=43Mw-f6vIbo https://www.youtube.com/watch?v=sEmCQzueyEQ
    
    I assume (but don’t know for sure) that what’s happening in the videos isn’t as they appear (e.g. forging handwriting isn’t that hard), but it’s at least an interesting fictional example of a somewhat-additive attack like this.
- FireStormOOO 10 Sep 2023 2:12 UTC
  1 point
  0
  Parent
  Covert side channels like you’re suggesting would probably be a related and often helpful thing for someone trying to do what OP is talking about, but I think the side channels are distinct from the things they can be used for.
- Dweomite 10 Sep 2023 0:02 UTC
  1 point
  0
  Parent
  “in the spirit of” a sum-threshold attack (if each perturbation just signals 0 or 1, is not context-aware, and the full ciphertext is cryptographically encrypted on top of the steganography)
  By this logic, wouldn’t all textual messages qualify? The letters of this comment are individually insignificant but add up to communicating an idea.
  Except they’re not actually “adding”, they’re interacting in a structured way that isn’t commutative or associative. The same letters in a different order wouldn’t “add up” to the same idea. This isn’t subdividing an action into smaller actions; it’s building a complex machine that only functions as an entire unit. It is “more than the sum of its parts.”
  - TsviBT 13 Sep 2023 12:17 UTC
    2 points
    0
    Parent
    Yeah, this is why I didn’t include steganography. (I don’t know whether adversarial images are more like steganography / a message or more like a sum-threshold attack. )