An extension of the attack is to use some simple AI techniques to generate so many spam posts that regular users get tired of downvoting them, so the only votes ML can learn from are the sock puppet votes. New account status is easily under the attacker’s control (just create a bunch of accounts ahead of time and wait until they’re no longer new). So it seems fairly easy for an attacker to ensure you don’t have case #1.
The spam from this kind of attacker would be much harder to deal with than the kind of spam we typically see today, since you’re not trying to get someone to click something or otherwise do something, just generate poor quality content that can’t be automatically detected as such.
Just to highlight where the theoretical analysis goes wrong:
We have some tradeoff between “letting spam through” (of the type these attackers are posting) and “blocking good content.”
The attackers here are able to create arbitrary amounts of spam.
So the worst case is already arbitrarily bad. (Assuming our loss function is really a sum over posts.)
So the issue is mostly incentives: this gives an incentive for an attacker to generate large amounts of innocuous but quality-lowering spam. It still doesn’t make the worst case any worse, if you had actual adversarial users you were screwed all along under these assumptions.
In my dissertation research I usually make some limiting assumption on the attacker that prevents this kind of attack, in particular I assume one of:
At least some small fraction (say 10%) of users of the system are honest—the attacker can’t completely overwhelm honest users.
We have access to an external social network, and at least some small fraction (say 10%) of friends of honest users are honest—the attacker can’t completely swamp the social networks of honest users.
Under these conditions we can potentially keep the work per honest user modest (each person must stomp out 10 crappy responses). Obviously it is better if you can get the 10% up to 50% or 90%, e.g. by imposing a cost for account creation, and without such costs it’s not even clear if you can get 10%. Realistically I think that the most workable solution is to mostly use outside relationships (e.g. FB friendships), and then to allow complete outsiders to join by paying a modest cost or using a verifiable real-world identity.
I haven’t analyzed virtual moderation under these kinds of assumptions though I expect we could.
I agree that virtual moderation may create stronger incentives for spam+manipulation and so hasten the day when you need to start being more serious about security, and that over the short term that could be a fatal problem. But again, if there is someone with an incentive to destroy your forum and they are able to create an arbitrary number of perfect shills, you need to somehow limit their ability anyway, there just isn’t any way around it.
(For reference, I don’t think the LW shills are near this level of sophistication.)
The first question of my first security-related job interview was, “If someone asked you to determine whether a product, for example PGP, is secure, what would you do?” I parroted back the answer that I had just learned from a book, something like, “First figure out what the threat model is.” The interviewer expressed surprise that I had gotten the answer right, saying that most people would just dive in and try to attack the cryptosystem.
These days I think the answer is actually wrong. It’s really hard to correctly formalize all of the capabilities and motivations of all potential adversaries, and once you have a threat model it’s too tempting to do some theoretical analysis and think, ok, we’re secure under this threat model, hence we’re probably secure. And this causes you to miss attacks that you might have found if you just thought for a few days or months (or sometimes just a few minutes) about how someone might attack your system.
In this case I don’t fully follow your theoretical analysis, and I’m not sure what threat model it assumed precisely, but it seems that the threat model neglected to incorporate the combination of the motivation “obtain power to unilaterally hide content (while otherwise leaving the forum functional)” and the capability “introduce new content as well as votes”, which is actually a common combination among real-world forum attackers.
How so? Since security cannot be absolute, the threat model is basically just placing the problem into appropriate context. You don’t need to formalize all the capabilities of attackers, but you need to have at least some idea of what they are.
and think, ok, we’re secure under this threat model, hence we’re probably secure
That’s actually the reverse: hardening up under your current threat models makes you more secure against the threats you listed but doesn’t help you against adversaries which your threat model ignores. E.g. if you threat model doesn’t include a nation-state, you’re very probably insecure against a nation-state.
You don’t need to formalize all the capabilities of attackers, but you need to have at least some idea of what they are.
But you usually already have an intuitive idea of what they are. Writing down even an informal list of attackers’ capabilities at the start of your analysis may just make it harder for you to subsequently think of attacks that use capabilities outside of that list. To be clear, I’m not saying never write down a threat model, just that you might want to brainstorm about possible attacks first, without having a more or less formal threat model potentially constrain your thinking.
But you usually already have an intuitive idea of what they are
The point is that different classes of attackers have very different capabilities. Consider e.g. a crude threat model which posits five classes:
Script kiddies randomly trawling the ’net for open vulnerabilities
Competent hackers specifically targeting you
As above, but with access to your physical location
People armed with subpoenas (e.g. lawyers or cops)
Black-ops department of a large nation-state
A typical business might then say “We’re going to defend against 1-3 and we will not even try to defend against 4-5. We want to be sure 1 get absolutely nowhere and we will try to make life very difficult for 3 (but no guarantees)”. That sounds like a reasonable starting point to me.
We can say something like: for any fixed sequence of prediction problems, the predictions made by a particular ML algorithm are nearly as good as if we had used the optimal predictor from some class (with appropriate qualifiers), and in particular as good as if we had set the weights of all adversarial users to 0. There is no real threat model.
The blog post really didn’t come with a claim about security;I didn’t even note the above fact while writing the blog post, I pointed it out in response to the question “Why do you think ML would withstand a determined adversary here?” The blog post did come with a claim about “I think this will eventually work well,” and in discussion “I think we can just try it and see.” This was partly motivated by the observation that the setting is low stakes and the status quo implementations are pretty insecure.
(I’m clarifying because I will be somewhat annoyed if this blog post and discussion is later offered as evidence about my inability to think accurately about security, which seems plausible given the audience. I would not be annoyed if it was used as evidence that I am insufficiently attentive to security issues when thinking about improvements to stuff on the internet, though I’m not yet convinced of that given the difference between generating ideas and implementing step 2: “Spend another 5-10 hours searching for other problems and considerations.”)
I guess the real way to deal with this is to say “each comment imposes some cost to evaluate, you need to pay that cost when you submit a comment” and then to use an independent mechanism to compensate contributions. That seems like a much bigger change though.
An extension of the attack is to use some simple AI techniques to generate so many spam posts that regular users get tired of downvoting them, so the only votes ML can learn from are the sock puppet votes. New account status is easily under the attacker’s control (just create a bunch of accounts ahead of time and wait until they’re no longer new). So it seems fairly easy for an attacker to ensure you don’t have case #1.
The spam from this kind of attacker would be much harder to deal with than the kind of spam we typically see today, since you’re not trying to get someone to click something or otherwise do something, just generate poor quality content that can’t be automatically detected as such.
Just to highlight where the theoretical analysis goes wrong:
We have some tradeoff between “letting spam through” (of the type these attackers are posting) and “blocking good content.”
The attackers here are able to create arbitrary amounts of spam.
So the worst case is already arbitrarily bad. (Assuming our loss function is really a sum over posts.)
So the issue is mostly incentives: this gives an incentive for an attacker to generate large amounts of innocuous but quality-lowering spam. It still doesn’t make the worst case any worse, if you had actual adversarial users you were screwed all along under these assumptions.
In my dissertation research I usually make some limiting assumption on the attacker that prevents this kind of attack, in particular I assume one of:
At least some small fraction (say 10%) of users of the system are honest—the attacker can’t completely overwhelm honest users.
We have access to an external social network, and at least some small fraction (say 10%) of friends of honest users are honest—the attacker can’t completely swamp the social networks of honest users.
Under these conditions we can potentially keep the work per honest user modest (each person must stomp out 10 crappy responses). Obviously it is better if you can get the 10% up to 50% or 90%, e.g. by imposing a cost for account creation, and without such costs it’s not even clear if you can get 10%. Realistically I think that the most workable solution is to mostly use outside relationships (e.g. FB friendships), and then to allow complete outsiders to join by paying a modest cost or using a verifiable real-world identity.
I haven’t analyzed virtual moderation under these kinds of assumptions though I expect we could.
I agree that virtual moderation may create stronger incentives for spam+manipulation and so hasten the day when you need to start being more serious about security, and that over the short term that could be a fatal problem. But again, if there is someone with an incentive to destroy your forum and they are able to create an arbitrary number of perfect shills, you need to somehow limit their ability anyway, there just isn’t any way around it.
(For reference, I don’t think the LW shills are near this level of sophistication.)
The first question of my first security-related job interview was, “If someone asked you to determine whether a product, for example PGP, is secure, what would you do?” I parroted back the answer that I had just learned from a book, something like, “First figure out what the threat model is.” The interviewer expressed surprise that I had gotten the answer right, saying that most people would just dive in and try to attack the cryptosystem.
These days I think the answer is actually wrong. It’s really hard to correctly formalize all of the capabilities and motivations of all potential adversaries, and once you have a threat model it’s too tempting to do some theoretical analysis and think, ok, we’re secure under this threat model, hence we’re probably secure. And this causes you to miss attacks that you might have found if you just thought for a few days or months (or sometimes just a few minutes) about how someone might attack your system.
In this case I don’t fully follow your theoretical analysis, and I’m not sure what threat model it assumed precisely, but it seems that the threat model neglected to incorporate the combination of the motivation “obtain power to unilaterally hide content (while otherwise leaving the forum functional)” and the capability “introduce new content as well as votes”, which is actually a common combination among real-world forum attackers.
How so? Since security cannot be absolute, the threat model is basically just placing the problem into appropriate context. You don’t need to formalize all the capabilities of attackers, but you need to have at least some idea of what they are.
That’s actually the reverse: hardening up under your current threat models makes you more secure against the threats you listed but doesn’t help you against adversaries which your threat model ignores. E.g. if you threat model doesn’t include a nation-state, you’re very probably insecure against a nation-state.
But you usually already have an intuitive idea of what they are. Writing down even an informal list of attackers’ capabilities at the start of your analysis may just make it harder for you to subsequently think of attacks that use capabilities outside of that list. To be clear, I’m not saying never write down a threat model, just that you might want to brainstorm about possible attacks first, without having a more or less formal threat model potentially constrain your thinking.
The point is that different classes of attackers have very different capabilities. Consider e.g. a crude threat model which posits five classes:
Script kiddies randomly trawling the ’net for open vulnerabilities
Competent hackers specifically targeting you
As above, but with access to your physical location
People armed with subpoenas (e.g. lawyers or cops)
Black-ops department of a large nation-state
A typical business might then say “We’re going to defend against 1-3 and we will not even try to defend against 4-5. We want to be sure 1 get absolutely nowhere and we will try to make life very difficult for 3 (but no guarantees)”. That sounds like a reasonable starting point to me.
We can say something like: for any fixed sequence of prediction problems, the predictions made by a particular ML algorithm are nearly as good as if we had used the optimal predictor from some class (with appropriate qualifiers), and in particular as good as if we had set the weights of all adversarial users to 0. There is no real threat model.
The blog post really didn’t come with a claim about security;I didn’t even note the above fact while writing the blog post, I pointed it out in response to the question “Why do you think ML would withstand a determined adversary here?” The blog post did come with a claim about “I think this will eventually work well,” and in discussion “I think we can just try it and see.” This was partly motivated by the observation that the setting is low stakes and the status quo implementations are pretty insecure.
(I’m clarifying because I will be somewhat annoyed if this blog post and discussion is later offered as evidence about my inability to think accurately about security, which seems plausible given the audience. I would not be annoyed if it was used as evidence that I am insufficiently attentive to security issues when thinking about improvements to stuff on the internet, though I’m not yet convinced of that given the difference between generating ideas and implementing step 2: “Spend another 5-10 hours searching for other problems and considerations.”)
I guess the real way to deal with this is to say “each comment imposes some cost to evaluate, you need to pay that cost when you submit a comment” and then to use an independent mechanism to compensate contributions. That seems like a much bigger change though.