Privacy and Manipulation


My parents taught me the norm of keeping my promises.

My vague societal culture taught me a norm of automatically treat asdfasdfasdfasdfasdf asd asdf asdfcertain types of information as private.

My vague rationalist culture taught me norms that include:

Eliezer’s post about meta-honesty was one of the most influential posts I’ve read in the past few years, and among the posts that inspired the coordination frontier. I was impressed that Eliezer looked at ethical edgecases, and wasn’t content to make a reasonable judgment call and declare himself done.

He went on to think through the ramifications of various policies, devise a potential new norm/​protocol, examine reasons that protocol might work or not work. He noted considerations like [paraphrased] “It matters that the norm be simple enough that people can reliably understand and use it.” Or, quoted directly: “This norm is too subtle for Twitter. It might be too subtle for us, too.

From this post, I derived a (not-quite-spelled-out) norm of “when you try to navigate the edge cases of your norms, try thinking through the underlying principles. But don’t try to be too clever, and consider the ways your edge-case-handling may fail to scale.”

With this in mind, I want to walk through one of the ethical dilemmas I faced that I reflected on, when writing Norm Innovation and Theory of Mind. This is more of an object-level post, primarily a followup to my Privacy Practices sequence. But it seemed like a useful illustrative example for the Coordination Frontier concept.

Privacy norms can be wielded as an obfuscating weapon

Sometimes, privacy is wielded as a tool to enable manipulation.

I’ve run into a couple people who exploited my good faith /​ willingness to keep things confidential, as part of an overall manipulative pattern. Unfortunately, I don’t feel comfortable going too far into the details here (please don’t speculate in the comments), which makes it a bit harder to sanity check.

“Manipulation” is a tricky to define. It’s a blurry line between “communicating normally” and “communicating in a way that systematically distorts another people’s thinking and controls their behavior against their wishes”. I’d like it say “it’s hard to define but you know it when you see it”, but often it’s hard to see it because manipulation systematically tries not to be seen.

I’ve met some people seemed deliberately manipulative, and some people who might have been well intentioned, but in the end it didn’t matter. They interacted with me (and others) in a way that felt increasingly uncomfortable, which seemed to be harming people. They skirted lines, wove narratives that made it feel awkward for me to criticize them or think clearly.

One of the key strategies they employed was to make it feel awkward to get help from other people to think or sanity check things. And one tactic in that strategy was pushing for confidentiality – sometimes explicitly, sometimes implicitly.

Explicit promises I regret

One person (call them Dave) asked for explicit promises of confidentiality on a number of occasions, sometimes after-the-fact. We once had a long conversation about their worldview and worries they had, which ended with me saying “so, wait, is this meant to be confidential?”. They responded, somewhat alarmed-seeming, with “oh definitely I would never have told you all this if I thought you might share it.”

At the time I found that somewhat annoying, but agreed to keep it confidential and didn’t think much of it. (Nowadays try to notice when a conversation is veering into sensitive topics, and have a quick meta-conversation about confidentiality preferences in advance).

The most frustrating thing came under ideal-privacy-conditions: Dave asked me to make a specific promise of confidentiality before telling me something. I agreed. Then they told me some stories that included somebody harming themselves as a result of interaction with Dave.

Later on, a number of people turned out to be having bad interactions with Dave. Many of them had had similar conversations with him. Some of those conversations had included promises of confidentiality. Others had not. It gradually became clear that Dave was not being honest.

What became really frustrating was that a) it was actually important to figure out whether Dave was harmful, and it was much harder to do without sharing notes. b) more infuriatingly, most of the information had been given to some people without conditions of confidentiality, but it was still hard to talk about openly about without betraying promises.

I think it’s important to take promises seriously. But in this case I think many of the promises had been a mistake. Part of this is because I think people should generally make fewer privacy promises in the first place.

At the time, I decided to reveal some bits of the information when it seemed really important, and acknowledging to myself that this made me a less trustworthy person, in some ways. This seemed worth it, because if I hadn’t revealed the information, I’d be revealing myself to be untrustworthy in other ways – I’d be the sort of person who was vulnerable to manipulative attacks. Integrity isn’t just about being honest, it’s about being functional and robust. Sometimes it involves hard tradeoffs in no-win scenarios.

It feels important to me that I internalize that hit to my integrity. That’s part of why I’m writing this blogpost – it’s sometimes necessary to violate your internal moral code (including keeping promises). But, when I do, I want people to know that I take it seriously. And I want people to have an accurate model of me.

But in this case, at least, the solution going forwards is pretty simple: I now try to avoid making such promises in the first place.

Instead, I include a clause saying “in rare circumstances, if I come to believe that this was a part of a manipulative pattern that is harming people, I may carefully share some of the information with other people.” Most of the time this is fine, because most private-information is obviously not the sort of thing that’s likely to be interpreted as part of a manipulative pattern (assuming you trust me to have remotely sane judgment).

There are some cases where you have information that you’d like to share with me, that I actually want to hear, which is the sort of thing that could be easily construed as manipulative and/​or harmful, and which requires more trust than you currently trust my judgment. (Dave would have recognized this to be true about the conversation he was asking confidentiality about).

I am not sure what to do in those cases.

I think I would never commit to 100% reliable confidentiality. But, if the conversation seemed important, I’d first have a lengthy conversation about meta-honesty and meta-privacy. I might ask Alice for ~2 confidants that we both trust (from different parts of the social graph), who I might go to to get help evaluating whether Alice is manipulating me.

Implicit confidentiality and incriminating evidence

Another person (call them Carla) never extracted a promise of confidentiality from me. But they took advantage of a vague background norm. Normally, if someone comes to me talking about something troubling them, I try to keep it private by default. If someone is hurting and expresses vulnerability, I want them to feel safe talking through a problem with me (whether they’re just venting, or trying to devise a solution).

Sometimes, this includes them talking about times they screwed up, or ways they harmed people. And in most cases (that I have experienced) it still seemed correct to keep that confidential by default – the harm was relatively minor. Meanwhile, there was value in helping someone with a “Am I the asshole?” kind of question.

But some of my talks with Carla veered into “man, this is actually a red flag that should have prompted me to a higher level of attention”, where I should have considered not just how to help Carla, but how to investigate whether Carla was harming others and what to do about it.

In one notable case, the conversation broached a subject that might have been explicitly damning of Carla. I asked a clarifying question about it. She said something evasive, avoided answering the question. I let it slide.

If I had paid more attention, I think I could have updated on Carla not being trustworthy much sooner. (In fact, it was another few years before I made the update, and Carla is no longer welcome in my day-to-day life). I now believe Carla had a conscious strategy of revealing different parts of herself with different people, making people feel awkward for violating her trust, and using that to get away with harmful behavior in relatively-plain-sight.

I’m a bit unsure how harshly to judge myself. Noticing manipulation, evasion, and adversarial action is legitimately hard. My guess is that at the time, it was a little beyond my skillset to have noticed and taken appropriate action. It’s not useful to judge yourself harshly for things you couldn’t really have done better.

But it wasn’t unreasonably beyond my skillset-at-the-time. And in any case, by this point, it is within my skillset. I hold myself now to the standard of paying attention if someone is skirting the line of confessing something harmful.

What do you do if someone does confess something harmful, though?

It’s still generally good to have a norm of “people can come to each other expressing vulnerabilities.” It’s bad if Alice has to worry “but, if I express a vulnerability that Bob decides is actually bad, Bob will reveal it to other people and I will get hurt.”

Most of the time, I think it is quite good for Alice to feel safe coming to me, even if I think she’s being a bit of a jerk to someone. It’s only in rare cases that I think it makes sense to get a sanity-check from someone else.

I don’t think I ever owed Carla the security of a promise. But, it still matters whether people can generally expect to feel safe sharing vulnerable information with me.

Reneging on Confidentiality Pro-Socially

I don’t have a great solution. But, here is my current algorithm for how to handle this class of situation:

First, put a lot of upfront effort into talking publicly about my privacy policies, so that they’re already in the water and ideally, Alice already knows about them.

Second, notice as soon as Alice starts sharing something vulnerable, and say “hey, is this something you want me to keep confidential? If so I’d like to chat a little about how I do confidentiality.” (What happens next depends on the exact situation. But at the very least I convey that I’m not promising confidentiality yet. And if that means Alice isn’t comfortable sharing, she should stop and we should talk about it more at the meta level)

Third, if I think that Alice is manipulating me and I’m not sure what to do, get help from a confidant who promises a high degree of confidentiality. Share as little information as possible with them so that they can help me form my judgment about the situation. Try to get as much clarity as I can, while violating as little implicit or explicit expectations of privacy as possible.

Fourth, ????. Maybe I decide the red flag is actually just a yellow flag, and Alice is fine, and I continue to help Alice with their problem. If I believe Alice’s behavior is manipulative and harmful, but that she’s mostly acting in good faith, maybe I talk directly to her about it.

And, in (I hope rare?) cases, ask a couple other confidants, and if everyone seems to agree that Alice is acting adversarially, maybe start treating her as an adversary.

The Catch-All Escape Clause

Once upon a time, I didn’t have a special clause in my privacy policy for “manipulative patterns”. I made promises that didn’t caveat that potentiality. I had the moral-unluck to have to deal with some situations without having thought them through in advance, and took a hit to my integrity because of that.

It seems quite plausible this will not be the last time I discover a vulnerability in my privacy practices, or my general commitment-making practices.

So, it currently seems like I should include more general escape clauses. If you are trusting me with something important, you are necessarily trusting my future judgment. I can promise that, if I need to renege on a promise, I will do so as pro-socially as I can. (i.e. try to internalize as much of the cost as I can, and try to adjust my overall policies to avoid having to renege further in the future)

Communities of Robust Agents

These are my current practices. I’m not confident they are best practices. I think this is a domain where it is particularly important that a social network has shared assumptions (or at least, common knowledge of divergent assumptions).

It matters whether a community is a safe space to vulnerably reveal challenges you’re facing.

It matters whether a community can sanely discuss literal infohazards.

It also matters whether a community can notice when someone is using the guise of vulnerability or infohazards to obfuscate a pattern of harm, or power grab.

I aspire to be a robust agent, and I hope that my community can be a robust community. The social circles I run in are trying to do complicated things, for which there is no common wisdom.

They require norms that are intelligently designed, not culturally evolved. I want to have norms that are stable upon reflection, that are possible to talk about publicly, that people can inspect and agree “yes, these norms are the best tradeoffs we could make given the circumstances.”