Sum-threshold attacks

Link post

How do you affect something far away, a lot, without anyone noticing?

(Note: you can safely skip sections. It is also safe to skip the essay entirely, or to read the whole thing backwards if you like.)

The frog’s lawsuit

Attorney for the defendant: “So, Mr. Frog. You allege that my client caused you grievous bodily harm. How is it that you claim he harmed you?”

Frog: “Ribbit RIBbit ribbit.”

Attorney: “Sir...”

Frog: “Just kidding. Well, I’ve been living in a pan for the past two years. When I started, I was the picture of health, and at first everything was fine. But over the course of the last six months, something changed. By last month, I was in the frog hospital with life-threatening third-degree burns.”

Attorney: “And could you repeat what you told the jury about the role my client is alleged to have played in your emerging medical problems?”

Frog: “Like I said, I don’t know exactly. But I know that when my owner wasn’t away on business, every day he’d do something with the stove my pan was sitting on. And then my home would seem to be a bit hotter, always a bit hotter.”

Attorney: “Your owner? You mean to say...”

Judge: “Let the record show that Mr. Frog is extending his tongue, indicating the defendant, Mr. Di’Alturner.”

Attorney: “Let me ask you this, Mr. Frog. Is it right to say that my client——your owner——lives in an area with reasonably varied weather? It’s not uncommon for the temperature to vary by ten degrees over the course of the day?”

Frog: “True.”

Attorney: “And does my client leave windows open in his house?”

Frog: “He does.”

Attorney: “So I wonder, how is it that you can tell that a slight raise in temperature that you experience——small, by your own admission——how can you be sure that it’s due to my client operating his stove, and not due to normal fluctuations in the ambient air temperature?”

Frog: “I can tell because of the correlation. I tend to feel a slight warming after he’s twiddled the dial.”

Attorney: “Let me rephrase my question. Is there any single instance you can point to, where you can be sure——beyond a reasonable doubt——that the warming was due to my client’s actions?”

Frog: “Ah, um, it’s not that I’m sure that any one increase in temperature is because he turned the dial, but...”

Attorney: “Thank you. And would it be fair to say that you have no professional training in discerning temperature and changes thereof?”

Frog: “That would be accurate.”

Attorney: “And are you aware that 30% of frogs in your state report spontaneous slight temperature changes at least once a month?”

Frog: “But this wasn’t once a month, it was every day for weeks at a ti——”

Attorney: “Sir, please only answer the questions I ask you. Were you aware of that fact?”

Frog: “No, I wasn’t aware of that, but I don’t see wh——”

Attorney: “Thank you. Now, you claim that you were harmed by my client’s actions, which somehow put you into a situation where you became injured.”

Frog: “¡I have third degree burns all ov——”

Attorney: “Yes, we’ve seen the exhibits, but I’ll remind you to only speak in response to a question I ask you. What I’d like to ask you is this: Why didn’t you just leave the frying pan? If you were, as you allege, being grievously injured, wasn’t that enough reason for you to remove yourself from that situation?”

Frog: “I, I didn’t notice that it was happening at the time, each change was so subtle, but...”

Attorney: “Thank you. As your counsel would have advised you, the standard for grievous bodily harm requires intent. Now are we really expected to conclude, beyond a reasonable doubt, that my client intended to cause you harm, via a method that you didn’t even notice? That even though you can’t point to so much as a single instance where my client even provably had anything to do with the temperature of your frying pan home, even a single instance that is inconsistent with normal fluctuations in temperature or with normal frog life… despite that, you claim to know that Mr. Di’Alturner intended to cause you grievous bodily harm?”

Frog: “Well yes, because I can see now, in retrospect, that there was a pattern...”

Attorney: “A pattern? We are in a court of law, presenting our cases to this fine jury to decide whether my client has committed any crimes. We’re not in Salem, sussing out patterns like so much spellcraft. You can’t just wake up one day and decide to blame my client for your medical problems because you think there’s some ephemeral pattern that you’re only seeing now. Crimes are actions——clear cut, definable, visible actions that violate the law. And by your own admission, you can’t point to a single such action. The defense rests, your Honor.”

Thanks to Gaurav Sett for reminding me of the boiling frog.

Controlling the sum

Suppose that you are relaying a message from Alice to Bob. Alice sends you a vector of real numbers with one hundred dimensions, and you’re supposed to pass it on to Bob. Alice chooses to communicate a message to Bob, but she also adds in some small Gaussian noise, and Bob knows that she does this. Your goal is send Bob a vector in such a way that has a large sum, i.e. is high.

One way you could do this is to set the first coordinate, , to be really high, like , and otherwise set . This makes the sum large. But it also makes it obvious to Bob that the vector has been tampered with. He’s going to read the vector, and he’ll notice that the first coordinate is huge and doesn’t fit with the message.

What else can you do?

Here’s how you can make the sum large without Bob noticing: just add a small amount to each coordinate, like . Then Bob could attribute each coordinate’s offset to noise, and you’ve made the sum of greater by 100.

Sum-threshold attacks

The general idea is that an agent can push on something in a way that no one notices by distributing its effects through lots of different channels. That way no single channel carries a noticeable amount of optimization power.

Normally an agent affects something by pushing on it hard through one big channel. For example, I can open a door by pushing on it with my hand. That’s one big channel: one stream of signals through one nerve activates one big push.

But an agent can also affect something by pushing on it just a little bit through each one of many different channels. For example, I could open the door by simultaneously pushing on it just a little bit with my hand, and also pushing just a bit with my foot, and also blowing on it, and also activating a weak little motor embedded in the hinge, and also having a friend wave a magnet around so that the doorknob is pulled forward, and so on. I could set it up so that the force applied to the door is barely noticeable from any one source, but in aggregate, the force is enough to open the door.

The door example is kind of silly, but this sort of sum-threshold attack happens all the time in social situations. For example, suppose that whenever Alice talks about plums, Bob makes a subtle disapproving gesture, like a slight look of annoyance, or briefly turning away, or hardening the tone of his voice a bit. Each individual gesture may be subtle enough that Alice doesn’t bring to conscious attention that Bob reacted negatively toward something she said, and yet Alice could still have a large change to her behavior. She might intuitively avoid mentioning plums (or more likely, some broader category like all food). The aggregate effect is large, even though each contribution to the effect is small.

The name is supposed to suggest a mismatch. The ultimate effect is measured as a sum——how averse Alice is to talking about plums is an aggregate of all of Bob’s subtle nudges. But whether Alice notices Bob’s influence might be more of a threshold. If Bob makes an obvious gesture, like yelling at Alice when she mentions plums, then Alice will notice. But if each gesture is subtle, then maybe none of the gestures will rise above the threshold beyond which Alice would notice. A sum-threshold attack produces a large sum of all the coordinates taken together, but stays below the threshold in each coordinate taken on its own. The threshold quantity might be: is it legal; is it noticeable; is it describable; is it worth addressing; is it worth caring about; is it unambiguous; does it show intent.

Legibility

A sum-threshold attack is sneaky. It stays below the radar, letting the optimization flow around through many channels, before the channels reconverge in the target.

Not only might the target not notice the attack, but also onlookers have trouble recognizing that the sum-threshold attack is happening. That’s because the natural way to demonstrate that there is an attack is to show: Here is a channel of optimization, and see, there’s a lot of optimization power flowing through this channel from Bob to Alice.

In a sum-threshold attack, there’s no one coordinate in the vector that’s especially out of place. Mr. Frog’s owner never turned the stove up by more than a tiny increment, on any one day. So no one can point to one or a few coordinates——one or a few actions——that demonstrate the attack. The attack is rotated out-of-basis. It’s not large on any one coordinate; it’s large in some other direction.

For some abuse victims, this is part of why they have trouble saying what the abuse consisted of. Any given action, any specific incident, seems small and ambiguous. Did the temperature go up a bit because of the stove dial, or because of random fluctuation? How sure are you that Bob subtly disapproved of your friendship with someone else, or was he just annoyed about something unrelated from his day?

There’s no good answer to “Ok, you have two minutes to describe what happened to you, tell me your top three examples of Bob’s bad behavior.”. If someone refuses to compute “abuse” as a possible coordinate——doesn’t admit abuse as a single direction in the space of behaviors, doesn’t recognize “he was turning up the dial gradually, adding up to boiling”——then there’s no coordinate you can show them, no single action by Bob, that’s clearly bad and has a large magnitude. You don’t have a word——or, they won’t let you use the word——that indicates the direction in which the vector is long, the feature of Bob’s behavior that blows way past the threshold of noticing and caring, so that they’ll have to listen. You don’t have that word, and the people you’re trying to convince will only accept an action as attention-worthy if it’s above a threshold of badness, and each one of Bob’s actions is below that threshold. “Pattern”? “Gaslighting”? What is this, a witch hunt?

By making new words and phrases, we can change what measurements are easy, natural, widely known, and in logical common knowledge.

More examples

A meme, for your consideration

If each piece of evidence is evaluated on its own, weighed against all the evidence in favor of the incumbent theory, then each new piece of evidence against the incumbent will, in turn, lose the fight to overturn the incumbent theory. The army of argument-soldiers for the incumbent theory will win each time by defeat in detail. One argument loses against the army.

If you try to demonstrate that a sum-threshold attack is being carried out, you have only a disconnected series of weak arguments. Each one of your arguments on its own is far below the threshold needed to defeat the incumbent theory, which says that there’s no attack. You need a coordinate that tracks the sum. That coordinate marshals all your arguments, combining their strength. There is a forest, not just many trees; there is a blazing fire, not just many small flames.

DDoS attack

A distributed denial-of-service attack shuts off a service by making service requests by many different users, overwhelming the service provider so that it can’t provide good service to anyone.

You can’t stop the attack by blocking, shutting off, or punishing any one user. Any one user request could be legitimate, so you can’t prove any one user is doing something wrong.

Systemic oppression

Were you refused service because you’re black, or because of a misunderstanding? Were you passed over for a promotion because of your performance, or because you’re a woman? Is the guy talking down to you just arrogant, or specifically making assumptions about you because of how you look? Is she asking where you grew up because she likes learning about people, or because she wants you out of her country——not enough to legibly hurt you, but enough to distributedly coordinate with other racists to make it clear to you that you aren’t welcome? Microaggressions can add up to macroaggressions.

Some hermeneutic injustice (h/​t TJ) can be corrected by adding another coordinate to the space of perception defined by language. A boss pressuring an employee to sleep with him could apply a lot of pressure in sum, staying below the threshold——but not as easily if the employee has the coordinate “sexual harassment”, and can expect others to acknowledge that coordinate. On that coordinate, the boss’s behavior is high-magnitude.

From “Oppression” by Marilyn Frye:

Cages. Consider a birdcage. If you look very closely at just one wire in the cage, you cannot see the other wires. If your conception of what is before you is determined by this myopic focus, you could look at that one wire, up and down the length of it, and unable to see why a bird would not just fly around the wire any time it wanted to go somewhere. Furthermore, even if, one day at a time, you myopically inspected each wire, you still could not see why a bird would have trouble going past the wires to get anywhere. There is no physical property of any one wire, nothing that the closest scrutiny could discover, that will reveal how a bird could be inhibited or harmed by it except in the most accidental way. It is only when you step back, stop looking at the wires one by one, microscopically, and take a macroscopic view of the whole cage, that you can see why the bird does not go anywhere; and then you will see it in a moment. It will require no great subtlety of mental powers. It is perfectly obvious that the bird is surrounded by a network of systematically related barriers, no one of which would be the least hindrance to its flight, but which, by their relations to each other, are as confining as the solid walls of a dungeon.

Adversarial image attacks

From “Exploring the Space of Adversarial Images”:

In each column, the top image plus the slight perturbation in the middle image gives the lower image. The image classifier AI correctly classifies all the top images, and incorrectly classifies all the bottom images. To the human eye, the perturbed images are nearly indistinguishable. For the volcano, I can see that there’s slight distortion, but it’s still basically the same; for the other images, I can only tell that there’s distortion with the guidance of the perturbation image; and for the kit foxes, I can’t tell at all.

Is this really a sum-threshold attack? I’m not sure. It’s not literally a sum-threshold attack. But it’s some kind of “lots of small perturbations which don’t look like much affect some distant variable a lot”. It looks more like an attack in the spirit of a sum-threshold attack if

  • multiplying the perturbation by 1.1 increases the confidence of the neural net’s erroneous judgement; and

  • randomly resetting some of the pixels in the perturbation back to neutral leaves a perturbation that still has much of the effect.

Other names

I’m not sure what to call this sort of thing. Is there a preexisting name?

Instead of “attack”, we could say “adversary” or “channel” or “optimization” or “effect” or “transmission”.

Instead of “sum-threshold” we could say:

  • Out-of-basis, rotated, basis misaligned, off-norm, alternate norm, norm mismatch. The space of behavior can be thought of as a vector space, where the dimensions are something like features that we tend to notice. Then there’s (something like) a norm on this vector space, which says how much we notice. This norm is not the norm (or a conjugate), so it’s not invariant under a full group of rotations. In practice, this norm-like valuation has a full distinguished basis. For example, asking “what’s the worst thing that happened” is putting an norm on the space where the dimensions are “things that happened”, where “things that happened” include “Alice picked up a cup” but not “Alice picked up her cup very slightly faster than usual every day this month”. The vector that has in every coordinate has very low norm, but could be a very long vector. It’s rotated out of alignment with the basis, so it’s hard to notice. This isn’t really the best image though, because the same effect can be gotten if the “noticing norm” is , which doesn’t have a distinguished basis, while the “attack norm” is . Hence the name “norm mismatch”.

  • Frogboiling. The boiling frog is more specific, though. It’s also about progressively getting accustomed to a new normal.

  • Timothee Chauvin: illegible nudging, persistent low-grade nudging, stealth nudging.

  • Sum-max. If the question is “what’s the worst that happened”, that’s more of a maximum than a threshold.

  • Broadband, wide-channel, breadth-based; broadshallow, shallowwide, shallowstrong, widestrong, broadstrong; broad-shallow-strong.

  • Ramified, multi-channel, many-pathed.

  • Delta, diamond, many-tributaries, anastomosing, anabranching, braided, bayou. Part of the core idea of sum-threshold attacks is that from the attacker, many lines of optimization flow; they go their separate ways, each one unnoticed; and then they converge in the target, producing a large effect. This suggests the geometric image of a diamond, with the attacker and the target at each acute point. It also suggests a river which spreads and branches (like a river delta), and then later the anabraches recombine. That’s called anastomosis. The image of a braided river, where the anabranches are constantly shifting their route, suggests an attacker who is shifting optimization channels. Image from https://​​fifthgoal.wordpress.com/​​2010/​​09/​​17/​​the-brahmaputra-river/​​:

  • Distributed, diffuse, unconcentrated.

  • Teleporting. As if the effect teleports from the attacker to the target.