[Edit: I’m going to re-write this at some point, I don’t think I managed to say much of anything very clearly. I’ve removed the first section.]

The essay takes the same starting assumption as I do. Honesty is deadly important and something you work hard at. There are some edge cases though, and Eliezer tries to make a simple rule that captures a lot of them.

Eliezer’s general approach is not case-based as mine is, but more law-based, where he looks for general rules, and considers individual cases insofar as they provide him with forced-moves from the rule-based perspective. The things that motivate him aren’t individual experiences, but are:

The fact that it’s easier to always say things that are technically true if you’ve got high verbal intelligence, and that this means requiring absolute honesty is unfairly easier for some than for others.
The massive number of counterfactual versions of you who would like you to keep plausible deniability when answering questions like “What did you do last night?”.
The case of hiding Jews in the attic and answering to Nazis who ask if you are hiding Jews.
Robin Hanson’s problem of automatic norms, where people judge others for not immediately knowing their own norms. This induces (amongst many things) some self-doubt about whether your norms are as obviously correct as you’ve been thinking.

Instead of coming up with bits of advice here and there, he comes up with a new rule.

Eliezer’s rule is fairly straightforward. He acknowledges that, yes, there are situations where even a very honest person will fail to be honest. In line with what I’ve said above, there are many edge cases.

And so he simply suggests that on top of trying hard to be honest throughout life, you should be absolutely honest about where you’ll likely be honest and dishonest.

Here’s how he puts it.

Be at least as honest as an unusually honest person. Furthermore, when somebody asks for it and especially when you believe they’re asking for it under this code, try to convey to them a frank and accurate picture of the sort of circumstances under which you would lie. Literally never swear by your meta-honesty that you wouldn’t lie about a hypothetical situation that you would in fact lie about.

In many of the edge cases I listed above, I have genuinely sometimes communicated falsehoods to someone (wittingly and unwittingly). The important thing that Eliezer adds is that if they or others were to ask me in general about such situations, I would tell them, (or at least explicitly remain silent if I felt I had to).

“Yes, if I feel backed into a corner by a pathological institution who requires me to say certain things with conviction in order to even get on a plane, I am likely to say those things.”

“Yes, if I feel I’ve totally run out of cognitive resources and don’t expect to be able to replenish them in the next couple of months, I may tell you a simple but slightly misleading answer that will lead to the right local action.”

“Yes, if I believed a friend of mine was being unjustly imprisoned entirely by a corrupt government, I would be likely to act in a way I determine most helpful to my friend being released, and may state outright falsehoods.”

And so on. Eliezer writes:

[Living out meta-honesty in real life means] stopping and asking yourself “Would I be willing to publicly defend this as a situation in which unusually honest people should lie, if somebody posed it as a hypothetical?”

I can’t require everyone around me to be more honest than I think is possible for myself. Nonetheless, we need to find our way to a strong position where we can discuss the edge cases and get an understanding of how we each draw those boundaries. If we have differences in the boundaries and don’t hold ourselves to this standard, it will be very hard to cooperate and build agreements on trust, not least because we’re misleading each other about our thoughts and actions in these areas.

So when we’re talking more generally about honesty, it’s important to be able to needle in on the reasoning we use when making calls in the tough cases, and not pull any clever tricks on the meta level.

I don’t know whether there is a single standard of honesty that I can apply to all people. Some of us are stronger than others, and will be able to uphold honesty under harsher situations than others. I know people stronger than me, I know people weaker than me. But we need to be open about that, and talk about it. And when discussing that, we should be absolutely honest.

Virtues and Rules

It’s good to be an honest person. I commonly take the virtue ethics perspective on honesty, and as a virtue, I think it’s one of the most important ones to live up to, up there with courage and curiosity and maybe one or two others.

I think that Eliezer’s post is attempting to be a straightforward, somewhat theoretical addition to that vision—or rather, it’s an attempt to make a reflectively consistent update to that vision. I think if I advised someone to be an honest person, and then gave them the further advice in Eliezer’s post, this would seem like a natural extension and not obviously inconsistent. I’m not certain that all of Eliezer’s details are correct, and I look forward to others thinking more about this, but it feels like distinct progress to me.

Moving to deontology, where we try to make our morals a little more explicit, and write down rules. As above, there are many edge cases, and the deontology does not weather too close an examination. But I’ve tried at times to live up to the deontological rule of pure honesty, and it’s been intensely refreshing. Starting to say that you’ll do something later, and stopping when you realise you’re making a false promise, and instead stating the true state of affairs (“...I would like to do that for you… but I can’t do it right this minute… and you should actually only assign a 30% probability to me remembering to do it later unprompted. Having realised that, let me get out my phone and set a reminder.”) It’s a great step toward becoming a robust agent.

Can We Build Common Knowledge Of Meta-Honesty?

The question to grapple with here is not whether this analysis of honesty is a valuable step toward understanding the underlying Laws, but is whether in its current form we can build a communal norm around meta-honesty. Thou shalt be meta-honest. Is understanding meta-honesty something we can build common knowledge around? I ask whether we can build common knowledge of meta-honesty, not of ‘meta-honesty’. The substance, not the linguistic name.

Eliezer is worried about miscommunication of this idea leading to an erosion of norms. He writes:

So there’s still a very obvious thing that could go wrong in people’s heads, a very obvious way that the notion of “meta-honesty” could blow up, or any other code besides “don’t say false things” could blow up. It’s why the very first description in the opening paragraphs says “Don’t lie when a normal highly honest person wouldn’t, and furthermore…” and you should never omit that preamble if you post any discussion of this on your own blog. THIS IS NOT THE IDEA THAT IT’S OKAY TO LIE SO LONG AS YOU ARE HONEST ABOUT WHEN YOU WOULD LIE IF ANYONE ASKS. It’s not an escape hatch.

If anything, meta-honesty is the idea that you should be careful enough about when you break the rule “Don’t lie” that, if somebody else asked the hypothetical question, you would be willing to PUBLICLY DEFEND EVERY ONE OF THOSE EXTRAORDINARY EXCEPTIONS as times when even an unusually honest person should lie.

How clear is the post?

When trying to offer practical advice in the form of rules, rather than trying to infinitely justify every definition and clause to all possible objections, I tend to think that the best way of doing things is to just instead show them what the thought processes of a person using the rule looks like. Show-don’t-tell. This is a skillset that Eliezer has worked harder on than maybe anyone else I know, and I think Eliezer’s essay genuinely gets across much of that kind of high-bandwidth information—especially sections 6 and 7, which show the details that sway his internal dialogue in concrete (and sometimes fictional) situations.

Er, actually, saying that aloud I’m not so sure[1]. I went into Eliezer’s essay several times before I felt I understood it all. Especially the key sections of 6 and 7, with all their extended discussion about glomarizing. He even wrote a small Harry-Dumbledore dialogue to help, and it was still tough. My friends also needed multiple reads. On initially reading the post that I thought it was proposing the norm that you could lie as long as you were open about that fact. The fact that Eliezer stated the opposite in capital letters was important and helpful for me not misunderstanding him, but I still didn’t get what it was he was saying the first time.

I found reading the post very valuable, because of the way Eliezer is constantly looking for his actions to be bound by laws of reasoning, which helps me see the sharp corners and edges of what honesty is, but I have to admit the post is very abstract, and even for many excellent people and close friends, I couldn’t hand this essay to them and be confident it wouldn’t (on net) confuse them. Getting some of these abstract chains of reasoning right is a tough high-wire act, and while Eliezer does a good job and I learned useful things, I do not expect I personally could have a very safe conversation with someone else while framing our conversation as an attempt to stick to the rules of consistent object-level glomarization when discussing meta-level principles.

I feel like I am likely going to be in situations where the term ‘meta-honesty’ is common knowledge, but the implementation details aren’t. I think that I can have the conversations this post is trying to have, but that if we try to speak in the same way that this post speaks, I expect we’ll all regularly hit very fraught double-illusions of transparency. In such a situation people will perceive a large number of norm violations that are actually linguistic confusions. Linguistic confusions are not strongly negative in most domains, merely a nuisance, but in prosecuting norm violations, especially when doing so quickly, they have grave consequences. You shouldn’t try to build communal norms on such shaky foundations.

What happens when we have common knowledge of the terminology but not the substance?

Let me be more concrete about this. I think that people can have these conversations, if they put in the effort each time to specify what they’re talking about.

“Would you lie in this situation?”
“Er, feels like you’re asking an awfully specific question. Of course I’m happy to answer in general though.”
“Does <this> feel like a general example?”

“Nearly. Is <this nearby example> okay for you?”
“Yeah. What would you do there?”
“Happy to answer. <This.>”

Rather than

“Seems like you just lied to that person.”
“I don’t think so.”
“Well, let’s talk. Are you meta-honest?”
“Yeah.”
“Would you lie in <this> nearby situation?”
“Hmm, I don’t think I have to tell you.”
“Huh? The whole point of meta honesty is that you have to tell me.”
“Well, I can speak generally, but I should glomarise if I think you’re getting too object-level.”
“Can you give me a definition of object-level here?”
“Uh, I feel a bit lost.”
“I think you’re being evasive on the meta-level, and you’re never supposed to do that.”

When both people are trying to define terms that neither of them invented and that both are new to… well, I don’t think this is a conversation we want to trust norm-disputes to.

Another framing: If anyone ever says “I’m not honest, but I’m meta-honest” in a social setting where we didn’t already have common knowledge about how meta-honesty is kinda like a reflectively stable form of high-effort honesty, then I would mostly expect the new common knowledge to be “We here don’t require everyone to stick to the deontological rule of being honest.” I think even if everyone in the room had read Eliezer’s post it wouldn’t change the situation.

The main point for me here is that the post is not intuitive, and we can’t currently expect to build common knowledge of the ideas, common knowledge of the meaning of the language, or expect to be able to build subsequent norm-disputes built around it. To give a contrast, I think a post like Arguments About Fast Takeoff is fine to be part of the common knowledge pool from this perspective, because it’s not trying to define terms to build a new norm around.

To point to the same thing: If Eliezer’s post was simply “Here are a bunch of conversations I have with myself about honesty” and they were a bunch of dialogues with no new term-and-norm proposed, I’d feel safer that people didn’t feel they were allowed to expect others to understand exactly what they meant. But I think, from the perspective of advocacy for a new term-and-norm, this post doesn’t hit the (admittedly very high) bar for making it clear and transparent to all who read it.

The problem here is indeed the idea of automatic norms (from Robin Hanson) that Eliezer discusses. Linguistic confusion around takeoff speeds is not a problem (I mean, in the long-run it’s an existential catastrophe, but in the short run it’s still an interesting conversation), but linguistic confusion over honesty norms can spiral into quite an escalation of perceived norm violations, which can be very damaging.

While I think this post managed to say surprising and useful things about the rules of honesty, I think this post has not hit the (very high) bar of making sure the readers have common knowledge of a new language and norm. I think, if this post were to be included in the “Best of 2018” sequence and book, it would be very valuable to have just a few paragraphs accompanying paragraphs saying that when you ask “Can we operate under the norms of meta-honesty?” you must accept “I’d like to taboo the term ‘meta-honesty’, because I’m not sure we’ll be talking about the same thing if we use that term.”

I’m not saying “we should renounce meta-honesty”. I view it as a valuable idea, in its early stages of being communicated. The idea has been discovered, but the work to make it shared amongst us all has only just begun. As people re-explain it, and flesh it out, I have hope it will become a simple idea for LessWrongers to build shared norms around.

Takeaways From This Review

Being honest is hard, and there are many weird and difficult edge-cases, such as those including context failures, negotiating with powerful institutions, politicised narratives, and compute limitations.
On top of the rule of trying very hard to be honest, Eliezer’s post offers an additional general rule for navigating the edge cases. The rule is that when you’re having a general conversation all about the sorts of situations you would and wouldn’t lie, you must be absolutely honest. You can explicitly not answer questions if it seems necessary, but you must never lie.
I think this rule is a good extension of the general principle of honesty, and appreciate Eliezer’s theoretical arguments for why this rule is necessary.
Eliezer’s post introduces some new terminology for discussions of honesty—in particular, the term ‘meta-honesty’ as the rule instead of ‘honesty’.
If the term ‘meta-honesty’ is common knowledge but the implementation details aren’t, and if people try to use it, then they will perceive a large number of norm violations that are actually linguistic confusions. Linguistic confusions are not strongly negative in most fields, merely a nuisance, but in discussions of norm-violation (e.g. a court of law) they have grave consequences, and you shouldn’t try to build communal norms on such shaky foundations.
I and many other people this post was directed at, find it requires multiple readings to understand, so I think that if everyone reads this post, it will not be remotely sufficient for making the implementation details common knowledge, even if the term can become that.
In general, I think that everyone should make sure it is acceptable, when asking “Can we operate under the norms of meta-honesty?” for the other person to reply “I’d like to taboo the term ‘meta-honesty’, because I’m not sure we’ll be talking about the same thing if we use that term.”
This is a valuable bedrock for thinking about the true laws, but not currently at a stage where we can build simple deontological communal rules around. I’d like to see more posts on both fronts.

Footnote

[1] Subroutine: Check for literal truth of words.

[Review] Meta-Honesty (Ben Pace, Dec 2019)

Can We Build Common Knowledge Of Meta-Honesty?

How clear is the post?

What happens when we have common knowledge of the terminology but not the substance?

Takeaways From This Review

Footnote