Is it OK for LW admins to look at DM metadata for spam prevention reasons?
Sometimes new users show up and spam a bunch of other users in DMs (in particular high-profile users). We can’t limit DM usage to only users with activity on the site, because many valuable DMs get sent by people who don’t want to post publicly. We have some basic rate limits for DMs, but of course those can’t capture many forms of harassment or spam.
Right now, admins can only see how many DMs users have sent, and not who users have messaged, without making a whole manual database query, which we have a policy of not doing unless we have a high level of suspicion of malicious behavior. However, I feel like it would be quite useful for identifying who is doing spammy things if we could also see who users have sent DMs to, but of course, this might feel bad from a privacy perspective to people.
So I am curious about what others think. Should admins be able to look at DM metadata to help us identify who is abusing the DM system? Or should we stick to aggregate statistics like we do right now? (React or vote “agree” if you think we should use DM metadata, and react or vote “disagree” if you think we should not use DM metadata).
I have no expectation of strong privacy on the site. I do expect politeness in not publishing or using my DM or other content, but that line is fuzzy and monitoring for spam (not just metadata; content and similarity-of-content) is absolutely something I want from the site.
For something actually private, I might use DMs to establish a mechanism. Feel free to look at that.
If you -do- intend to provide real privacy, you should formalize the criteria, and put up a canary page that says you have not been asked to reveal any data under a sealed order.
edit to add: I am relatively paranoid about privacy, and also quite technically-savvy in implementation of such. I’d FAR rather the site just plainly say “there is no expectation of privacy, act accordingly” than that it try to set expectations otherwise, but then have to move line later. Your Terms of Service are clear, and make no distinction for User Generated Content between posts, comments, and DMs.
An obvious thing to have would be a very easy “flag” button that a user can press if they receive a DM, and if they press that we can look at the DM content they flagged, and then take appropriate action. That’s still kind of late in the game (I would like to avoid most spam and harassment before it reaches the user), but it does seem like something we should have.
I wonder if you could also do something like, have an LLM evaluate whether a message contains especially-private information (not sure what that would be… gossip/reputationally-charged stuff? sexually explicit stuff? planning rebellions? doxxable stuff?), and hide those messages while looking at other ones.
Though maybe that’s unhelpful because spambot authors would just create messages that trigger these filters?
This is going the wrong direction. If privacy from admins is important (I argue that it’s not for LW messages, but that’s a separate discussion), then breaches of privacy should be exceptions for specific purposes, not allowed unless “really secret contents”.
Don’t make this filter-in for privacy. Make it filter-out—if it’s detected as likely-spam, THEN take more intrusive measures. Privacy-preserving measures include quarantining or asking a few recipients if they consider it harmful before delevering (or not) the rest, automated content filters, etc. This infrastructure requires a fair bit of data-handling work to get it right, and a mitigation process where a sender can find out they’re blocked and explicitly ask the moderator(s) to allow it.
The reason I suggest making it filter-in is because it seems to me that it’s easier to make a meaningful filter that accurately detects a lot of sensitive stuff than a filter that accurately detects spam, because “spam” is kind of open-ended. Or I guess in practice spam tends to be porn bots and crypto scams? (Even on LessWrong?!) But e.g. truly sensitive talk seems disproportionately likely to involve cryptography and/or sexuality, so trying to filter for porn bots and crypto scams seems relatively likely to have reveal sensitive stuff.
The filter-in vs filter-out in my proposal is not so much about the degree of visibility. Like you could guard my filter-out proposal with the other filter-in proposals, like to only show metadata and only inspect suspected spammers, rather than making it available for everyone.
I did have a pretty strong expectation of privacy for LW DMs. That was probably dumb of me.
This is not due to any explicit or implicit promise by the mods or the site interface I can recall. I think I was just automatically assuming that strong DM privacy would be a holy principle on a forum with respectable old-school internet culture around anonymity and privacy. This wasn’t really an explicitly considered belief. It just never occurred to me to question this. Just like I assume that doxxing is probably an offence that can result in an instant ban, even though I never actually checked the site guidelines on that.
The site is not responsible for my carelessness on this, but if there was an attention-grabbing box in the DM interface making it clear that mods do look at DMs and DM metadata under some circumstances that fall short of a serious criminal investigation or an apocalypse, I would have appreciated that.
FWIW, de-facto I have never looked at DMs or DM metadata, unless multiple people reached out to us about a person spamming or harassing them, and then we still only looked at the DMs that that person sent.
So I think your prior here wasn’t crazy. It is indeed the case that we’ve never acted against it, as far as I know.
I think it’s fine if the users are clearly informed about this happening, e.g. the DM interface showing a small message that explains how metadata is used. (But I think it shouldn’t be any kind of one-time consent box that’s easy to forget about.)
Yeah, agree. (Also agree with Dagon in not having an existing expectation of strong privacy in LW DMs. Weak privacy, yes, like that mods wouldn’t read messages as a matter of course.)
Here’s how I would think to implement this unintrusively: little ℹ️-type icon on a top corner of the screen of the DM interface screen (or to the side of the “Conversation with XYZ” header, or something.) When you click on that icon, it toggles a writeup about circumstances in which information from the message might be sent to someone else (what information and who.)
Given the relative lack of cybersecurity, I think there’s a good chance of LessWrong being hacked by outside parties and privacy be breached. Message content that’s really sensitive like sharing AI safety related secrets likely shouldn’t flow through LessWrong private messages.
One class where people might really want privacy is around reporting abuses by other people. If Alice writes a post about how Bob abused her, Carol might want to write Alice a messages about Bob abusing her as well while caring about privacy because Carol fears retaliation.
I think it would be worth having an explicit policy about how such information is handled, but looking at the DM metadata seems to me like it wouldn’t cause huge problems.
In an ideal world (perhaps not reasonable given your scale), you would have some sort of permissions and logging against some sensitive types of queries on DM metadata. (E.G., perhaps you would let any Lighthaven team member see on the dashboard “rate of DMs from accounts <1 month in age compared to historic baseline” aggregate number, but “how many DMs has Bob (an account over 90 days old) sent to Alice” would require more guardrails.
Edit: to be clear, I am comfortable with you doing this without such logging at your current scale and think it is reasonable to do so.
In a former job where I had access to logs containing private user data, one of the rules was that my queries were all recorded and could be reviewed. Some of them were automatically visible to anyone else with the same or higher level of access, so if I were doing something blatantly bad with user data, my colleagues would have a chance of noticing.
Could make this a report-based system? If the user reported a potential spam, then in the submission process ask for reasons, and ask for consent to look over the messages (between the reporter and the alleged spammer); if multiple people reported the same person it will be obvious this account is spamming with DM?
Okay if send rate gives you a reason to think it’s spam. Presumably you can set up a system that lets you invade the messages of new accounts sending large numbers of messages that doesn’t require you to cross the bright line of doing raw queries.
I’d be ~entirely comfortable with this given some constraints (e.g. a simple heuristic which flags the kind of suspicious behaviour for manual review, and wouldn’t capture the vast majority of normal LW users). I’d be slightly but not strongly uncomfortable with the unconstrained version.
Is it OK for LW admins to look at DM metadata for spam prevention reasons?
Sometimes new users show up and spam a bunch of other users in DMs (in particular high-profile users). We can’t limit DM usage to only users with activity on the site, because many valuable DMs get sent by people who don’t want to post publicly. We have some basic rate limits for DMs, but of course those can’t capture many forms of harassment or spam.
Right now, admins can only see how many DMs users have sent, and not who users have messaged, without making a whole manual database query, which we have a policy of not doing unless we have a high level of suspicion of malicious behavior. However, I feel like it would be quite useful for identifying who is doing spammy things if we could also see who users have sent DMs to, but of course, this might feel bad from a privacy perspective to people.
So I am curious about what others think. Should admins be able to look at DM metadata to help us identify who is abusing the DM system? Or should we stick to aggregate statistics like we do right now? (React or vote “agree” if you think we should use DM metadata, and react or vote “disagree” if you think we should not use DM metadata).
I have no expectation of strong privacy on the site. I do expect politeness in not publishing or using my DM or other content, but that line is fuzzy and monitoring for spam (not just metadata; content and similarity-of-content) is absolutely something I want from the site.
For something actually private, I might use DMs to establish a mechanism. Feel free to look at that.
If you -do- intend to provide real privacy, you should formalize the criteria, and put up a canary page that says you have not been asked to reveal any data under a sealed order.
edit to add: I am relatively paranoid about privacy, and also quite technically-savvy in implementation of such. I’d FAR rather the site just plainly say “there is no expectation of privacy, act accordingly” than that it try to set expectations otherwise, but then have to move line later. Your Terms of Service are clear, and make no distinction for User Generated Content between posts, comments, and DMs.
An obvious thing to have would be a very easy “flag” button that a user can press if they receive a DM, and if they press that we can look at the DM content they flagged, and then take appropriate action. That’s still kind of late in the game (I would like to avoid most spam and harassment before it reaches the user), but it does seem like something we should have.
I wonder if you could also do something like, have an LLM evaluate whether a message contains especially-private information (not sure what that would be… gossip/reputationally-charged stuff? sexually explicit stuff? planning rebellions? doxxable stuff?), and hide those messages while looking at other ones.
Though maybe that’s unhelpful because spambot authors would just create messages that trigger these filters?
This is going the wrong direction. If privacy from admins is important (I argue that it’s not for LW messages, but that’s a separate discussion), then breaches of privacy should be exceptions for specific purposes, not allowed unless “really secret contents”.
Don’t make this filter-in for privacy. Make it filter-out—if it’s detected as likely-spam, THEN take more intrusive measures. Privacy-preserving measures include quarantining or asking a few recipients if they consider it harmful before delevering (or not) the rest, automated content filters, etc. This infrastructure requires a fair bit of data-handling work to get it right, and a mitigation process where a sender can find out they’re blocked and explicitly ask the moderator(s) to allow it.
The reason I suggest making it filter-in is because it seems to me that it’s easier to make a meaningful filter that accurately detects a lot of sensitive stuff than a filter that accurately detects spam, because “spam” is kind of open-ended. Or I guess in practice spam tends to be porn bots and crypto scams? (Even on LessWrong?!) But e.g. truly sensitive talk seems disproportionately likely to involve cryptography and/or sexuality, so trying to filter for porn bots and crypto scams seems relatively likely to have reveal sensitive stuff.
The filter-in vs filter-out in my proposal is not so much about the degree of visibility. Like you could guard my filter-out proposal with the other filter-in proposals, like to only show metadata and only inspect suspected spammers, rather than making it available for everyone.
I did have a pretty strong expectation of privacy for LW DMs. That was probably dumb of me.
This is not due to any explicit or implicit promise by the mods or the site interface I can recall. I think I was just automatically assuming that strong DM privacy would be a holy principle on a forum with respectable old-school internet culture around anonymity and privacy. This wasn’t really an explicitly considered belief. It just never occurred to me to question this. Just like I assume that doxxing is probably an offence that can result in an instant ban, even though I never actually checked the site guidelines on that.
The site is not responsible for my carelessness on this, but if there was an attention-grabbing box in the DM interface making it clear that mods do look at DMs and DM metadata under some circumstances that fall short of a serious criminal investigation or an apocalypse, I would have appreciated that.
FWIW, de-facto I have never looked at DMs or DM metadata, unless multiple people reached out to us about a person spamming or harassing them, and then we still only looked at the DMs that that person sent.
So I think your prior here wasn’t crazy. It is indeed the case that we’ve never acted against it, as far as I know.
I think it’s fine if the users are clearly informed about this happening, e.g. the DM interface showing a small message that explains how metadata is used. (But I think it shouldn’t be any kind of one-time consent box that’s easy to forget about.)
Yeah, agree. (Also agree with Dagon in not having an existing expectation of strong privacy in LW DMs. Weak privacy, yes, like that mods wouldn’t read messages as a matter of course.)
Here’s how I would think to implement this unintrusively: little ℹ️-type icon on a top corner of the screen of the DM interface screen (or to the side of the “Conversation with XYZ” header, or something.) When you click on that icon, it toggles a writeup about circumstances in which information from the message might be sent to someone else (what information and who.)
Given the relative lack of cybersecurity, I think there’s a good chance of LessWrong being hacked by outside parties and privacy be breached. Message content that’s really sensitive like sharing AI safety related secrets likely shouldn’t flow through LessWrong private messages.
One class where people might really want privacy is around reporting abuses by other people. If Alice writes a post about how Bob abused her, Carol might want to write Alice a messages about Bob abusing her as well while caring about privacy because Carol fears retaliation.
I think it would be worth having an explicit policy about how such information is handled, but looking at the DM metadata seems to me like it wouldn’t cause huge problems.
In an ideal world (perhaps not reasonable given your scale), you would have some sort of permissions and logging against some sensitive types of queries on DM metadata. (E.G., perhaps you would let any Lighthaven team member see on the dashboard “rate of DMs from accounts <1 month in age compared to historic baseline” aggregate number, but “how many DMs has Bob (an account over 90 days old) sent to Alice” would require more guardrails.
Edit: to be clear, I am comfortable with you doing this without such logging at your current scale and think it is reasonable to do so.
In a former job where I had access to logs containing private user data, one of the rules was that my queries were all recorded and could be reviewed. Some of them were automatically visible to anyone else with the same or higher level of access, so if I were doing something blatantly bad with user data, my colleagues would have a chance of noticing.
Yeah, I’ve been thinking of setting up something like this.
Could make this a report-based system? If the user reported a potential spam, then in the submission process ask for reasons, and ask for consent to look over the messages (between the reporter and the alleged spammer); if multiple people reported the same person it will be obvious this account is spamming with DM?
edit: just saw previous comment on this too
Okay if send rate gives you a reason to think it’s spam. Presumably you can set up a system that lets you invade the messages of new accounts sending large numbers of messages that doesn’t require you to cross the bright line of doing raw queries.
I’d be ~entirely comfortable with this given some constraints (e.g. a simple heuristic which flags the kind of suspicious behaviour for manual review, and wouldn’t capture the vast majority of normal LW users). I’d be slightly but not strongly uncomfortable with the unconstrained version.