The Onion Test for Personal and Institutional Honesty

chanamessinger and Andrew_Critch

27 Sep 2022 15:26 UTC

164 points

31 comments3 min readLW link 3 reviews

Integrity Rationality Meta-Honesty

What links here?

Voting Results for the 2022 Review by Ben Pace (2 Feb 2024 20:34 UTC; 57 points)
On Loyalty by Nathan Young (EA Forum; 20 Feb 2023 10:29 UTC; 56 points)
Eli Tyre's comment on Lying is Cowardice, not Strategy by Connor Leahy (24 Oct 2023 20:54 UTC; 50 points)
EA & LW Forums Weekly Summary (26 Sep − 9 Oct 22′) by Zoe Williams (EA Forum; 10 Oct 2022 23:58 UTC; 24 points)
EA & LW Forums Weekly Summary (26 Sep − 9 Oct 22′) by Zoe Williams (10 Oct 2022 23:58 UTC; 13 points)
Habryka [Deactivated]'s comment on 80,000 hours should remove OpenAI from the Job Board (and similar EA orgs should do similarly) by Raemon (EA Forum; 4 Jul 2024 21:34 UTC; 10 points)
Vaniver's comment on If Anyone Builds It, Everyone Dies: Advertisement design competition by yams (10 Jul 2025 17:33 UTC; 7 points)
Eli Tyre's comment on Integrity in AI Governance and Advocacy by habryka (30 Nov 2023 23:23 UTC; 6 points)
ryan_greenblatt's comment on MIRI 2024 Communications Strategy by Gretta Duleba (30 May 2024 23:45 UTC; 3 points)
Five Reasons to Lie by Dzoldzaya (17 Jan 2023 16:53 UTC; 0 points)

Crossposted to EA Forum (69 points, 8 comments)

Screwtape 14 Jan 2024 6:53 UTC
26 points
1
Figuring out the edge cases about honesty and truth seem important to me, both as a matter of personal aesthetics and as a matter for LessWrong to pay attention to. One of the things people have used to describe what makes LessWrong special is that it’s a community focused on truth-seeking, which makes “what is truth anyway and how do we talk about it” a worthwhile topic of conversation. This article talks about it, in a way that’s clear. (The positive example negative example pattern is a good approach to a topic that can really suffer from illusion of transparency.)

Like Eliezer’s Meta-Honesty post, the approach suggested does rely on some fast verbal footwork, though the footwork need not be as fast as Meta-Honesty. Passing the Onion Test consistently requires the same kind of comparison to alternate worlds as glomarization, which is a bit of a strike against it but that’s hardly unique to the Onion Test.

I don’t know if people still wind up feeling mislead? For instance, I can imagine someone saying “I usually keep my financial state private” and having their conversation partners walk away with wildly different ideas of how they’re doing. Is it so bad they don’t want to talk about it? Is it so good they don’t want to brag? If I thought it was the former and offered to cover their share of dinner repeatedly, I might be annoyed if it turns out to be the latter.

I don’t particularly hold myself to the Onion Test, but it did provide another angle on the subject that I appreciated. Nobody has yet used it this way around me, but I could also see Onion Test declared in a similar manner to Crocker’s Rules, an opt-in social norm that might be recognized by others if it got popular enough. I’m not sure it’s worth the limited conceptual slots a community can have for those, but I wouldn’t feel the slot was wasted if Onion Tests made it that far.

This might be weird, but I really appreciate people having the conversations about what they think is honest and in what ways they think we should be honest out loud on the internet where I can read them. One can’t assume that everyone has read your article on how you use truth and is thus fairly warned, but it is at least a start. Good social thing to do, A+. I don’t know if more people thinking about this means we’d actually find a real consensus solution and it’s probably not actually the priority, but I would like a real consensus solution and at some previous point someone’s going to have to write down the prototype that leads to it.

Ultimately I don’t actually want this in the Best of 2022, not because it isn’t good, but because I’d worry a little about someone reading through the Best Of collections and thinking this was more settled or established than it is. The crux here is that I don’t think it’s settled, established, or widely read enough that people will know what you mean if you declare Onion Test. If I knew everyone on LessWrong would read everything in the Best Of 2022, then I’d change my mind and want this included so as to add the Test to our collective lexicon.
Eli Tyre 22 Jan 2024 6:37 UTC
6 points
0
This is an example of a clear textual writeup of a principle of integrity. I think it’s a pretty good principle, and one that I refer to a lot in my own thinking about integrity.

But even if I thought it was importantly flawed, I think integrity is super important, and therefore I really want to reward and support people thinking explicitly about it. That allows us to notice that our notions are flawed, and improve them, and it also allows us to declare to each other what norms we hold ourselves to, instead of sort of typical minding and assuming that our notion of integrity matches others’ notion, and then being shocked when they behave badly on our terms.
Nathan Young 21 Jan 2024 19:37 UTC
5 points
0
I think about this framing quite a lot. Is what I say going to lead to people assuming roughly the thing I think even if I’m not precise. So the concept is pretty valuable to me.
I don’t know if it was the post that did it, but maybe!

DirectedEvolution 27 Sep 2022 18:55 UTC
29 points
8
I have thoughts, but they are contingent on better understanding what you mean by “types” of hidden information. For example, you used “operating a cocaine dealership” as a “type” of information hidden by the question about health information. Operating a cocaine dealership is not a health matter, except perhaps indirectly if you get high on your own supply.
To further illustrate this ambiguity, a person might be having gay sex in a country where gay sex is criminalized, morally condemned, and viewed as a shocking aberration from normal human behavior. It does not seem to me to be out of integrity for a gay person to refrain from telling other people that they have gay sex in this (or any other) context.
Where this becomes problematic is when the two people have different expectations about what constitutes reasonable expectations and moral behavior. If we give free moral license to choose what to keep private, then it seems to me there is little difference between this onion model and an algorithm for “defining away dishonesty.” One can always justify oneself by saying “other people were foolish for having had such unreasonable expectations as to have been mislead or upset by my nondisclosure.” If “outsiders” are expected to be sufficiently cynical, then the onion model would even justify outright lying and withholding outrageous misbehaviors as within the bounds of “integrity,” as long as other people expected such infractions to be occurring.
It short, it seems that this standard of integrity reduces to “they should have known what they were getting into.”
As such, the onion model seems to rely on a pre-existing consensus on both what is epistemically normal and what is morally right. It is useful as a recipe for producing a summary in this context, but not for dealing with disagreement over what is behaviorally normal or right.
- Andrew_Critch 29 Sep 2022 14:18 UTC
  9 points
  5
  Parent
  As such, the onion model seems to rely on a pre-existing consensus on both what is epistemically normal and what is morally right. It is useful as a recipe for producing a summary in this context, but not for dealing with disagreement over what is behaviorally normal or right.
  I agree. For me it’s more of a characterization of honesty, not integrity (even though I consider honesty an aspect of integrity). Perhaps we should change the name.
- Raemon 28 Sep 2022 19:00 UTC
  2 points
  0
  Parent
  I think part of my answer here is “The more a person is a longterm trade partner, the more I invest in them knowing about my inner layers. If it seems like they have different expectations than I do, I’m proactive in sharing information about that.”
  - DirectedEvolution 28 Sep 2022 19:34 UTC
    3 points
    0
    Parent
    Yes, I agree the onion model can be a way to think about summarization and how to prioritize information sharing. I just don’t see it as particularly helpful for integrity.
Raemon 28 Sep 2022 18:55 UTC
13 points
3
I don’t have much to add, but I basically endorse this, and it’s similar to what I try to do myself.
Figuring out when and how to proactively share information about my inner-layers with people is a balancing act I’m still figuring out, but I’ve found it generally valuable for building trust.
Unnamed 28 Sep 2022 0:45 UTC
12 points
0
It’s not that clear to me exactly what test/principle/model is being proposed here.
A lot of it is written in terms of not being “misleading”, which I interpret as ‘intentionally causing others to update in the wrong direction’. But the goal to have people not be shocked by the inner layers suggests that there’s a duty to actively inform people about (some aspects of) what’s inside; leaving them with their priors isn’t good enough. (But what exactly does “shocked” mean, and how does it compare with other possible targets like “upset” or “betrayed”?) And the parts about “signposting” suggest that there’s an aim of helping people build explicit models about the inner layers, which is not just a matter of what probabilities/anticipations they have.
- chanamessinger 28 Sep 2022 12:55 UTC
  4 points
  0
  Parent
  I meant signposting to indicate things like saying “here’s a place where I have more to say but not in this context” etc, during for instance a conversation, so I’m truthfully saying that there’s more to the story.
  
  Yeah, I think “intentionally causing others to update in the wrong direction” and “leaving them with their priors” end up pretty similar (if you don’t make strong distinctions between action and omission, which I think this test at least partially rests on) if you have a good model of their priors (which I think is potentially the hardest part here).
DPiepgrass 27 Sep 2022 20:25 UTC
5 points
4
Isn’t this more like an onion test for… honesty?
Integrity is broader.
- Andrew_Critch 29 Sep 2022 14:15 UTC
  2 points
  0
  Parent
  I agree with this.
  - chanamessinger 29 Sep 2022 16:30 UTC
    1 point
    0
    Parent
    Title changed!
Ruby 8 Oct 2022 19:45 UTC
4 points
2
Curated. I’m a big fan of people thinking about how to be honest/meta-honest/not mislead/etc, it feels like doing so important for cooperation, but also important for one’s own epistemics. An easy way to lie to others is to lie to yourself, and by corollary, if you can make it easy to not lie to others, you can not lie to yourself.
Drake Morrison 9 Oct 2022 19:19 UTC
3 points
3
I feel like there is a difference between a Secret That You Must Protect, and information that is status-restricted.
Say you are preparing a secret birthday party for Alice, and they ask you if you have plans on their birthday. If the birthday is a Secret You Must Protect, then you would be Meta-Honest, and tell Alice you don’t have plans. If it’s just status-restricted, then you could tell Alice that you have something planned, but you can’t say more or ruin the surprise. Thereby passing the Onion test.
I think the danger is in confusing the two types of information. If you make a secret smelly, so that people know what kind of thing it is, then it loses a lot of the protection of being a secret in the first place. Half of a secret is the fact there is one, right? On the other hand, if you make everything a Secret You Must Protect then people may be surprised and feel betrayed when information was not sign-posted.
- Raemon 9 Oct 2022 20:08 UTC
  7 points
  3
  Parent
  birthday is a Secret You Must Protect, then you would be Meta-Honest,
  Calling this “being meta-honest” feels wrong to me. (I’m not sure if it’s technically wrong, but I think meta-honesty is something easy to get confused about and I think it’s worth putting extra effort into making clear the term doesn’t dilute by casual readers picking it up from context)
  Eliezer’s definition of meta-honesty is:
  “Don’t lie when a normal highly honest person wouldn’t, and furthermore, be honest when somebody asks you which hypothetical circumstances would cause you to lie or mislead—absolutely honest, if they ask under this code. However, questions about meta-honesty should be careful not to probe object-level information.”
  (note sure if you’re intending to use Eliezer’s definition here.)
  Independent of Eliezer’s definition… I do think lying when keeping a secret about someone’s birthday is the type of thing a meta-honest person might do… but, I wouldn’t call the part where you’re explicitly lying “being meta-honest.” (The part where, at an ordinary time, someone asks you “would you lie to protect a surprise party?” and you say “yes” is the part where you’re being meta-honest). There are a few reasonable definitions of meta-honest, but it seems wrong to call any instance of lying “being meta-honest.”
  [edited]
  - Andrew_Critch 10 Oct 2022 16:15 UTC
    4 points
    5
    Parent
    Ray, I have several times seen you trying to defend terms that will not — and in my opinion, should not be expected to — retain the specific meaning that you or the originator hope the term will retain. In particular, I predict that “meta-honesty” will not stably mean the specific thing you are quoting it as being supposed to mean, even amongst the crowd you-specifically are trying to define it to.
    The reason is that your linguistic prescriptions are not respecting the compositional structure of language. “Meta” already has a meaning and “honesty” already has a meaning. In the equilibrium of (even LessWrong readers) using the term “meta-honesty”, it will end up meaning something about as general as “meta” and “honesty” suggest, such as
    a) “honesty about what you’re honest about” or
    b) “honesty about meta-level questions”, or
    c) something else I haven’t thought of.
    
    Your quoted definition is a lot like (a), except the quote has these extra details: “Don’t lie when a normal highly honest person wouldn’t”. These details are not carried by the “meta” modifier in any way that seems normal to me. For reference, here is a broadly agreed upon definition of “meta”:
    https://www.dictionary.com/browse/meta
    
    I’m not sure exactly what “meta-honesty” will settle to meaning amongst your target audience. I’m just predicting it won’t be as specific as the thing you just quoted. You can probably push towards (a) if you like that more than (b) or (c) (as I do). But, you probably can’t get all the way to a situation where “meta-honesty” stably means the full quoted definition. When you push for it to mean the full definition, your readers have to choose between
    1) trusting their own internal sense of how to compose concepts clearly and usefully, versus
    2) memorizing a particular definition without a cited source and binding the definition to some otherwise-fairly-general-concept-words,
    … and I think your readers who are best at clear individual thinking will reliably choose (1) over (2).
    
    I.e., many of your readers will trust their own conceptual metacognition more than your assertion that they should be memorizing and awkwardly re-binding their concept-words.
    
    Generally, I think they’re right to do that, in that I think doing (1) more than (2) will improve their general ability to think clearly with themselves and with others.
    - Ben Pace 10 Oct 2022 19:47 UTC
      5 points
      2
      Parent
      Not-withstanding these points about how to go about defending terms, I think it’s not unreasonable to want there to be a short term that captures “honesty that is also closed under reflection”, i.e. a high level of honesty that is also fully consistent when making statements about itself e.g. “I am honest/dishonest in situation X”.
      Phrases like “I’m honest-under-reflection” or “I’m reflectively-honest” or “I’m meta-consistently honest” seem… more cumbersome and likely-to-cause-confusion to me, than the current attempt of “I’m meta-honest”.
      “I claim to have the property ‘meta-honesty’, where I strive to be honest and to be reflectively consistent about it.”
      We can’t always get what we want, and English doesn’t allow all important ideas to be said succinctly, but I want to defend the attempt to say important things in short phrases.
      - Andrew_Critch 22 Oct 2022 2:32 UTC
        4 points
        0
        Parent
        Huh, weird. I read Eliezer’s definition of meta-honesty as not the same thing as your definition of «honesty that is closed under reflection». Specifically, in Eliezer-meta-honesty, his honesty at the meta-level is stronger (i.e., zero tolerance for lies) than his honesty at the object level (some tolerance for lies), whereas your notion sounds like it has no such strengthening-as-you-go-up-in-meta pattern to it. Am I misunderstanding you?
        Ben Pace 23 Oct 2022 4:52 UTC
        7 points
        1
        Parent
        No, but I think you’re misunderstanding Eliezer. Let me explain.
        When I ask myself “Should I be dishonest at all in a particular situation?” I have pretty similar standards for lots of domains. The primary reason to ask is when there’s genuine questions to ask about whether an extremely powerful force is attempting to extract a specific lie from me, or whether an extremely powerful immoral force is leaving me no control over what it does except via deception. For domains where this is not the case, I want to speak plainly and honestly.
        When I list domains and ask how honest one ought to be in them (things like being honest about your work history to the government, honest about your relationship history to prospective partners, honest about your criminal record to anyone, honest about how your work is going to your boss, honest in conversations about your honesty to anyone, and so on), the standard is to be truthful except in a small number of situations where incredibly powerful entities or forces have broken the game board badly enough that the moral thing to do is to lie.
        I say this because I don’t think that being honest about your honesty is fundamentally different than being honest about other things, for all of them there’s a standard of no-lying, and an extremely high bar for an powerful entity to be threatening you and everything you care about for you to have to lie.
        Eliezer writes this reasoning about honesty:
        And I think it’s reasonable to expect that over the course of a human lifetime you will literally never end up in a situation where a Gestapo officer who has read this essay is pointing a gun at you and asking overly-object-level-probing meta-honesty questions, and will shoot you if you try to glomarize but will believe you if you lie outright, given that we all know that everyone, innocent or guilty, is supposed to glomarize in situations like that. Up until today I don’t think I’ve ever seen any questions like this being asked in real life at all, even hanging out with a number of people who are heavily into recursion.
        So if one is declaring the meta-honesty code at all, then one shouldn’t meta-lie, period; I think the rules have been set up to allow that to be absolute.
        I don’t believe that Eliezer applies different standards of honesty to normal situations and to meta-sentences about honesty. I think he applies the same standards, and finds that you are more under threat on the object level than you are on the (explicitly-discussed) meta level.
        itaibn0 23 Oct 2022 20:35 UTC
        9 points
        0
        Parent
        Eliezer is very explicit and repeats many times in that essay, including in the very segment you quote, that his code of meta-honesty does in fact compel you to never lie in a meta-honesty discussion. The first 4 paragraphs of your comment are not elaborating with what Eliezer really meant, they are disagreeing with him. Reasonable disagreements too, in my opinion, but conflating them with Eliezer’s proposal is corrosive to the norms that allows people to propose and test new norms.
        Ben Pace 25 Oct 2022 0:27 UTC
        5 points
        0
        Parent
        Re-reading the post, I see I was mistaken. Eliezer is undeniably proposing an absolute rule on the meta-level, not one where dishonesty should be “held to an extremely high bar” as I discussed.
        I’ll try to compress the difference between our proposals: I was proposing “Be highly honest, and be consistent when you talk about it on the meta-level”, whereas Eliezer is proposing “Be highly honest, and be absolutely honest when you talk about it on the meta-level”. The part I quoted was his consequentialist argument that the absolute rule would not be that costly, not a consequentialist account of when to be honest on the meta-level.
    - Raemon 10 Oct 2022 19:18 UTC
      5 points
      2
      Parent
      Nod. I do generally agree with this (fyi I think I more frequently complain to jargon-coiners that they are trying to coin jargon that won’t actually survive memetic drift, than I complain to people about using words wrong).
      And reflecting on both this most recent example, and on Pivotal Acts Means Something Specific, (not sure if you had a third example in mind), I also think the way I went about arguing the case wasn’t that great (I was doing it in a way that sounded like “speak authoritatively about what the term means” as opposed to a clarifying “so-and-so defined the word this way, for these reasons.”)
      I’ve updated my previous comment here to say “Eliezer defines it such-and-such way (not sure if you mean to be it as Eliezer defines it)”, and made a similar update to the pivotal act post.
      I have more thoughts about meta-honesty and how it should be defined but it’s probably getting off topic.
      - Andrew_Critch 22 Oct 2022 2:33 UTC
        3 points
        0
        Parent
        Cool! This was very much in line with the kind of update I was aiming for here, cheers :)
        Raemon 22 Oct 2022 3:44 UTC
        4 points
        0
        Parent
        I maybe want to add:
        The reason I made a big deal about these particular jargon-terms, was that they were both places where Eliezer noted “this is a concept that there will be a lot of pressure to distort or drift the term, and this concept is really important, so I’m going to add BIG PROMINENT WARNINGS about how important it is not to distort the concept.” (AFAICT he’s only done this twice, for metahonesty and pivotal acts)
        I think I agree with you in both cases that Eliezer didn’t actually name the concept very well, but I think it was true that the concepts were important, and likely to get distorted, and probably still would have gotten distorted even if he had named them better. So I endorse people having the move available of “put a giant warning that attempts to fight against linguistic entropy, when you have a technical term you think is important to preserve its meaning, which the surrounding intellectual community helps reinforce.”
        In this case I think there were some competing principles (protect technical terms, vs avoid cluttering the nomenclature-commons with bad terms). I was trying to do the former. My main update here is that I can do the former without imposing as much costs from the latter, and think more about the tradeoffs.
    - Andrew_Critch 10 Oct 2022 16:20 UTC
      3 points
      0
      Parent
      Here are some possible solutions to this problem that I think will work better than trying to get people to bind “meta-honesty” to the full definition:
      
      - You could use a name that references the source, like “meta-honesty in the sense of [source]” or “[source-name]-meta-honesty”.
      
      - You could use a longer name that captures more of the nuances you bolded, like “meta-honesty with object-level normalcy”, or something else that resonates better for you.
      - Raemon 10 Oct 2022 19:27 UTC
        7 points
        5
        Parent
        Thinking a bit more, I think meta-honesty is most useful if it means “honest about when you’re honest” (“honest about meta-level questions” doesn’t actually seem that important a concept at first glance, although open to counterarguments)
        I think the thing Eliezer was aiming at in Meta Honesty should probably just be called “Eliezer’s code of honesty” or something (which features meta-honesty as one of it’s building blocks). I agree “meta honest” doesn’t sound like a term meaning the thing Eliezer was aiming at (which was more of a code to follow, than a property of a type of statement)
        Andrew_Critch 22 Oct 2022 2:28 UTC
        2 points
        0
        Parent
        Yep I agree with this whole-sale.
  - Drake Morrison 9 Oct 2022 20:52 UTC
    1 point
    0
    Parent
    I agree. I’ll try to be more careful and clear about the wording in the future.
- chanamessinger 10 Oct 2022 11:41 UTC
  1 point
  0
  Parent
  This seems interesting to me but I can’t yet latch onto it. Can you give examples of secrets being one or the other?
  
  Are you distinguishing between “secrets where the existence of the secret is a big part of the secret” and “secrets where it’s not”?
  - Drake Morrison 17 Nov 2022 5:52 UTC
    2 points
    1
    Parent
    I think that’s the gist of it. I categorize them as Secret and Private. Where Secret information is something I deny knowing, (and therefore fails to pass the onion test), and Private information is something that people can know exist, even if I won’t tell them what it is (thereby passing the onion test).
    
    Also, see this which I found relevant.