Firming Up Not-Lying Around Its Edge-Cases Is Less Broadly Useful Than One Might Initially Think

Reply to: Meta-Honesty: Firming Up Honesty Around Its Edge-Cases

Eliezer Yudkowsky, listing advantages of a “wizard’s oath” ethical code of “Don’t say things that are literally false”, writes—

Repeatedly asking yourself of every sentence you say aloud to another person, “Is this statement actually and literally true?”, helps you build a skill for navigating out of your internal smog of not-quite-truths.

I mean, that’s one hypothesis about the psychological effects of adopting the wizard’s code.

A potential problem with this is that human natural language contains a lot of ambiguity. Words can be used in many ways depending on context. Even the specification “literally” in “literally false” is less useful than it initially appears when you consider that the way people ordinarily speak when they’re being truthful is actually pretty dense with metaphors that we typically don’t notice as metaphors because they’re common enough to be recognized legitimate uses that all fluent speakers will understand.

For example, if I want to convey the meaning that our study group has covered a lot of material in today’s session, and I say, “Look how far we’ve come today!” it would be pretty weird if you were to object, “Liar! We’ve been in this room the whole time and haven’t physically moved at all!” because in this case, it really is obvious to all ordinary English speakers that that’s not what I meant by “how far we’ve come.”

Other times, the “intended”[1] interpretation of a statement is not only not obvious, but speakers can even mislead by motivatedly equivocating between different definitions of words: the immortal Scott Alexander has written a lot about this phenomenon under the labels “motte-and-bailey doctrine” (as coined by Nicholas Shackel) and “the noncentral fallacy”.

For example, Zvi Mowshowitz has written about how the claim that “everybody knows” something[2] is often used to establish fictitious social proof, or silence those attempting to tell the thing to people who really don’t know, but it feels weird (to my intuition, at least) to call it a “lie”, because the speaker can just say, “Okay, you’re right that not literally[3] everyone knows; I meant that most people know but was using a common hyperbolic turn-of-phrase and I reasonably expected you to figure that out.”

So the question “Is this statement actually and literally true?” is itself potentially ambiguous. It could mean either—

  • “Is this statement actually and literally true as the audience will interpret it?”; or,

  • “Does this statement permit an interpretation under which it is actually and literally true?”

But while the former is complicated and hard to establish, the latter is … not necessarily that strict of a constraint in most circumstances?

Think about it. When’s the last time you needed to consciously tell a bald-faced, unambiguous lie?—something that could realistically be outright proven false in front of your peers, rather than dismissed with a “reasonable” amount of language-lawyering. (Whether “Fine” is a lie in response to “How are you?” depends on exactly what “Fine” is understood to mean in this context. “Being acceptable, adequate, passable, or satisfactory”—to what standard?)

Maybe I’m unusually honest—or possibly unusually bad at remembering when I’ve lied!?—but I’m not sure I even remember the last time I told an outright unambiguous lie. The kind of situation where I would need to do that just doesn’t come up that often.

Now ask yourself how often your speech has been partially optimized for any function other than providing listeners with information that will help them better anticipate their experiences. The answer is, “Every time you open your mouth”[4]—and if you disagree, then you’re lying. (Even if you only say true things, you’re more likely to pick true things that make you look good, rather than your most embarrassing secrets. That’s optimization.)

In the study of AI alignment, it’s a truism that failures of alignment can’t be fixed by deontological “patches”. If your AI is exhibiting weird and extreme behavior (with respect to what you really wanted, if not what you actually programmed), then adding a penalty term to exclude that specific behavior will just result in the AI executing the “nearest unblocked” strategy, which will probably also be undesirable: if you prevent your happiness-maximizing AI from administering heroin to humans, it’ll start administering cocaine; if you hardcode a list of banned happiness-producing drugs, it’ll start researching new drugs, or just pay humans to take heroin, &c.

Humans are also intelligent agents. (Um, sort of.) If you don’t genuinely have the intent to inform your audience, but consider yourself ethically bound to be honest, but your conception of honesty is simply “not lying”, you’ll naturally gravitate towards the nearest unblocked cognitive algorithm of deception.[5]

So another hypothesis about the psychological effects of adopting the wizard’s code is that—however noble your initial conscious intent was—in the face of sufficiently strong incentives to deceive, you just end up accidentally training yourself to get really good at misleading people with a variety of not-technically-lying rhetorical tactics (motte-and-baileys, false implicatures, stonewalling, selective reporting, clever rationalized arguments, gerrymandered category boundaries, &c.), all the while congratulating yourself on how “honest” you are for never, ever emitting any “literally” “false” individual sentences.


Ayn Rand’s novel Atlas Shrugged[6] portrays a world of crony capitalism in which politicians and businessmen claiming to act for the “common good” (and not consciously lying) are actually using force and fraud to temporarily enrich themselves while destroying the credit-assignment mechanisms Society needs to coordinate production.[7]

In one scene, Eddie Willers (right-hand man to our railroad executive heroine Dagny Taggart) expresses horror that the government’s official scientific authority, the State Science Institute, has issued a hit piece denouncing the new alloy, Rearden Metal, with which our protagonists have been planning to use to build a critical railroad line. (In actuality, we later find out, the Institute leaders want to spare themselves the embarrassment—and therefore potential loss of legislative funding—of the innovative new alloy having been invented by private industry rather than the Institute’s own metallurgy department.)

“The State Science Institute,” he said quietly, when they were alone in her office, “has issued a statement warning people against the use of Rearden Metal.” He added, “It was on the radio. It’s in the afternoon papers.”

“What did they say?”

“Dagny, they didn’t say it! … They haven’t really said it, yet it’s there—and it—isn’t. That’s what’s monstrous about it.”

[...] He pointed to the newspaper he had left on her desk. “They haven’t said that Rearden Metal is bad. They haven’t said it’s unsafe. What they’ve done is …” His hands spread and dropped in a gesture of futility.

She saw at a glance what they had done. She saw the sentences: “It may be possible that after a period of heavy usage, a sudden fissure may appear, though the length of this period cannot be predicted. … The possibility of a molecular reaction, at present unknown, cannot be entirely discounted. … Although the tensile strength of the metal is obviously demonstrable, certain questions in regard to its behavior under unusual stress are not to be ruled out. … Although there is no evidence to support the contention that the use of the metal should be prohibited, a further study of its properties would be of value.”

“We can’t fight it. It can’t be answered,” Eddie was saying slowly. “We can’t demand a retraction. We can’t show them our tests or prove anything. They’ve said nothing. They haven’t said a thing that could be refuted and embarrass them professionally. It’s the job of a coward. You’d expect it from some con-man or blackmailer. But, Dagny! It’s the State Science Institute!”

I think Eddie is right to feel horrified and betrayed here. At the same time, it’s notable that with respect to wizard’s code, no lying has taken place.

I like to imagine the statement having been drafted by an idealistic young scientist in the moral maze of Dr. Floyd Ferris’s office at the State Science Institute. Our scientist knows that his boss, Dr. Ferris, expects a statement that will make Rearden Metal look bad; the negative consequences to the scientist’s career for failing to produce such a statement will be severe. (Dr. Ferris didn’t say that, but he didn’t have to.) But the lab results on Rearden Metal came back with flying colors—by every available test, the alloy is superior to steel along every dimension.

Pity the dilemma of our poor scientist! On the one hand, scientific integrity. On the other hand, the incentives.

He decides to follow a rule that he thinks will preserve his “inner agreement with truth which allows ready recognition”: after every sentence he types into his report, he will ask himself, “Is this statement actually and literally true?” For that is his mastery.

Thus, his writing process goes like this—

“It may be possible after a period of heavy usage, a sudden fissure may appear.” Is this statement actually and literally true? Yes! It may be possible!

“The possibility of a molecular reaction, at present unknown, cannot be entirely discounted.” Is this statement actually and literally true? Yes! The possibility of a molecular reaction, at present unknown, cannot be entirely discounted. Okay, so there’s not enough evidence to single out that possibility as worth paying attention to. But there’s still a chance, right?

“Although the tensile strength of the metal is obviously demonstrable, certain questions in regard to its behavior under unusual stress are not to be ruled out.” Is this statement actually and literally true? Yes! The lab tests demonstrated the metal’s unprecedented tensile strength. But certain questions in regard to its behavior under unusual stress are not to be ruled out—the probability isn’t zero.

And so on. You see the problem. Perhaps a member of the general public who knew about the corruption at the State Science Institute could read the report and infer the existence of hidden evidence: “Wow, even when trying their hardest to trash Rearden Metal, this is the worst they could come up with? Rearden Metal must be pretty great!”

But they won’t. An institution that proclaims to be dedicated to “science” is asking for a very high level of trust—and in the absence of a trustworthy auditor, they might get it. Science is complicated enough and natural language is ambiguous enough, that that kind of trust that can be betrayed without lying.

I want to emphasize that I’m not saying the report-drafting scientist in the scenario I’ve been discussing is a “bad person.” (As it is written, almost no one is evil; almost everything is broken.) Under more favorable conditions—in a world where metallurgists had the academic freedom to speak the truth as they see it (even if their voice trembles) without being threatened with ostracism and starvation—the sort of person who finds the wizard’s oath appealing, wouldn’t even be tempted to engage in these kinds of not-technically-lying shenanigans. But the point of the wizard’s oath is to constrain you, to have a simple bright-line rule to force you to be truthful, even when other people are making that genuinely difficult. Yudkowsky’s meta-honesty proposal is a clever attempt to strengthen the foundations of this ethic by formulating a more complicated theory that can account for the edge-cases under which even unusually honest people typically agree that lying is okay, usually due to extraordinary coercion by an adversary, as with the proverbial murderer or Gestapo officer at the door.

And yet it’s precisely in adversarial situations that the wizard’s oath is most constraining (and thus, arguably, most useful). You probably don’t need special ethical inhibitions to tell the truth to your friends, because you should expect to benefit from friendly agents having more accurate beliefs.

But an enemy who wants to use information to hurt you is more constrained if the worst they can do is selectively report harmful-to-you true things, rather than just making things up—and therefore, by symmetry, if you want to use information to hurt an enemy, you are more constrained if the worst you can do is selectively report harmful-to-the-enemy true things, rather that just making things up.

Thus, while the study of how to minimize information transfer to an adversary under the constraint of not lying is certainly interesting, I argue that this “firming up” is of limited practical utility given the ubiquity of other kinds of deception. A theory of under what conditions conscious explicit unambiguous outright lies are acceptable doesn’t help very much with combating intellectual dishonesty—and I fear that intellectual dishonesty, plus sufficient intelligence, is enough to destroy the world all on its own, without the help of conscious explicit unambiguous outright lies.

Unfortunately, I do not, at present, have a superior alternative ethical theory of honesty to offer. I don’t know how to unravel the web of deceit, rationalization, excuses, disinformation, bad faith, fake news, phoniness, gaslighting, and fraud that threatens to consume us all. But one thing I’m pretty sure won’t help much is clever logic puzzles about implausibly sophisticated Nazis.

(Thanks to Michael Vassar for feedback on an earlier draft.)


  1. ↩︎

    I’m scare-quoting “intended” because this process isn’t necessarily conscious, and probably usually isn’t. Internal distortions of reality in imperfectly deceptive social organisms can be adaptive for the function of deceiving conspecifics.

  2. ↩︎

    If I had written this post, I would have titled it “Fake Common Knowledge” (following in the tradition of “Fake Explanations”, “Fake Optimization Criteria”, “Fake Causality”, &c.)

  3. ↩︎

    But it’s worth noting that the “Is this statement actually and literally true?” test, taken literally, should have caught this, even if my intuition still doesn’t want to call it a “lie.”

  4. ↩︎

    Actually, that’s not literally true! You often open your mouth to breathe or eat without saying anything at all! Is the referent of this footnote then a blatant lie on my part?—or can I expect you to know what I meant?

  5. ↩︎

    A similar phenomenon may occur with other attempts at ethical bindings: for example, confidentiality promises. Suppose Open Opal tends to wear her heart on her sleeve and more specifically, believes in lies of omission: if she’s talking with someone she trusts, and she has information relevant to that conversation, she finds it incredibly psychologically painful to pretend not to know that information. If Paranoid Paris has much stronger privacy intuitions than Opal and wants to message her about a sensitive subject, Paris might demand a promise of secrecy from Opal (“Don’t share the content of this conversation”)—only to spark conflict later when Opal construes the literal text of the promise more narrowly than Paris might have hoped (“‘Don’t share the content’ means don’t share the verbatim text, right? I’m still allowed to paraphrase things Paris said and attribute them to an anonymous correspondent when I think that’s relevant to whatever conversation I’m in, even though that hypothetically leaks entropy if Paris has implausibly determined enemies, right?”).

  6. ↩︎

    I know, fictional evidence, but I claim that the kind of deception illustrated in quoted passage to follow is entirely realistic.

  7. ↩︎

    Okay, that’s probably not exactly how Rand or her acolytes would put it, but that’s how I’m interpreting it.