jessicata comments on Drowning children are rare

jessicata 29 May 2019 4:19 UTC
5 points
In your previous comment you’re talking to Wei Dai, though. Do you think Wei Dai is going to misinterpret the werewolf concept in this manner? If so, why not link to the original post to counteract the possible misinterpretation, instead of implying that the werewolf frame itself is wrong?

(meta note: I’m worried here about the general pattern of people optimizing discourse for “the public” who is nonspecific and assumed to be highly uninformed / willfully misinterpreting / etc, in a way that makes it impossible for specific, informed people (such as you and Wei Dai) to communicate in a nuanced, high-information fashion)

[EDIT: also note that the frame you objected to (the villagers vs werewolf frame) contains important epistemic content that the “let’s incentivize non-obfuscatory behavior” frame doesn’t, as you agreed in your subsequent comment after I pointed it out. Which means I’m going to even more object to saying “the villagers/werewolf frame is bad” with the defense being that “people might misinterpret this”, without offering a frame that contains the useful epistemic content of the misinterpretable frame]
- Raemon 29 May 2019 5:06 UTC
  14 points
  Parent
  I’m worried here about the general pattern of people optimizing discourse for “the public”
  I do agree that this is a pattern to watch out for. I don’t think it applies here, but could be wrong. I think it’s very important that people be able to hold themselves to higher standards than what they can easily explain to the public, and it seems like a good reflex to notice when people might be trying to do that and point it out.
  But I’m worried here about well-informed people caching ideas wrongly, not about the general public. More to say about this, but first want to note:
  also note that the frame you objected to (the villagers vs werewolf frame) contains important epistemic content that the “let’s incentivize non-obfuscatory behavior” frame doesn’t, as you agreed in your subsequent comment after I pointed it out.
  Huh—this just feels like a misinterpretation or reading odd things into what I said.
  It had seemed obvious to me that to disincentivize obfuscatory behavior, you need people to be aware of what obfuscatory behavior looks like and what to do about it, and it felt weird that you saw that as something different.
  It is fair that I may not have communicated that well, but that’s part of my point – communication is quite hard. Similarly, I don’t think the original werewolf post really communicates the thing it was meant to.
  “Am I a werewolf” is not a particularly useful question to ask, and neither is “is so and so a werewolf?” because the answer is almost always “yes, kinda.” (and what exactly you mean by “kinda” is doing most of the work). But, nonetheless, this is the sort of question that the werewolf frame prompts people to ask.
  I’m worried about this, concretely, because after reading Effective Altruism is Self Recommending a while, despite the fact that I thought lots about it, and wrote up detailed responses to it (some of which I posted and some of which I just thought about privately), and I ran a meetup somewhat inspired by taking it seriously...
  ...despite all that, a year ago when I tried to remember what it was about, all I could remember was “givewell == ponzi scheme == bad”, without any context of why the ponzi scheme metaphor mattered or how the principle was supposed to generalize. I’m similarly worried that a year from now, “werewolves == bad, hunt werewolves”, is going to be the thing I remember about this.
  The five-word-limit isn’t just for the uninformed public, it’s for serious people trying to coordinate. The public can only coordinate around 5-word things. Serious people trying to be informed still have to ingest lots of information and form detailed models but those models are still going to have major bits that are compressed, out of pieces that end up being about five words. And this is a major part of why many people are confused about Effective Altruism and how to do it right in the first place.
  - Benquo 29 May 2019 16:59 UTC
    16 points
    Parent
    I’m worried about this, concretely, because after reading Effective Altruism is Self Recommending a while, despite the fact that I thought lots about it, and wrote up detailed responses to it (some of which I posted and some of which I just thought about privately), and I ran a meetup somewhat inspired by taking it seriously...
    ...despite all that, a year ago when I tried to remember what it was about, all I could remember was “givewell == ponzi scheme == bad”, without any context of why the ponzi scheme metaphor mattered or how the principle was supposed to generalize. I’m similarly worried that a year from now, “werewolves == bad, hunt werewolves”, is going to be the thing I remember about this.
    The five-word-limit isn’t just for the uninformed public, it’s for serious people trying to coordinate. The public can only coordinate around 5-word things. Serious people trying to be informed still have to ingest lots of information and form detailed models but those models are still going to have major bits that are compressed, out of pieces that end up being about five words. And this is a major part of whymany people are confused about Effective Altruism and how to do it right in the first place.
    If that’s your outlook, it seems pointless to write anything longer than five words on any topic other than how to fix this problem.
    - Raemon 30 May 2019 1:39 UTC
      14 points
      Parent
      I agree with the general urgency of the problem, although I think the frame of your comment is somewhat off. This problem seems… very information-theoretically-entrenched. I have some sense that you think of it as solvable in a way that it’s fundamentally not actually solvable, just improvable, like you’re trying to build a perpetual motion machine instead of a more efficient engine. There is only so much information people can process.
      (This is based entirely off of reading between the lines of comments you’ve made, and I’m not confident what your outlook actually is here, and apologies for the armchair psychologizing).
      I think you can make progress on it, which would look something like:
      0) make sure people are aware of the problem
      1) building better infrastructure (social or technological), probably could be grouped into a few goals:
      nudge readers towards certain behavior
      nudge writers towards certain behavior
      provide tools that amplify readers capabilities
      provide tools that amplify writer’s capabilities
      2) meanwhile, as a writer, make sure that the concepts you create for the public discourse are optimized for the right kind of compression. Some ideas compress better than others. (I have thought about the details of this
      This *is* my outlook, and that yes this that both I, as well as you and Jessica, should probably be taking some kind of action that takes this outlook strategically seriously if you aren’t already.
      Distillation Technology
      A major goal I have for LessWrong, which the team has talked about a lot, is improving distillation technology. It’s not what we’re currently working on because, well, there are *multiple* top priorities that all seem pretty urgent (and all seem like pieces of the same puzzle). But I think Distillation Tech is the sort of thing most likely to meaningfully improve the situation.
      Right now the default mode people interact with LessWrong and many other blogging platforms is “write up a thing, post it, maybe change a few things in response to feedback.” But for ideas that are actually going to become building blocks of the intellectual commons, you need to continuously invest in improving them.
      Arbital tried to do this, and it failed because the problem is hard in weird ways, many of them somewhat hard to anticipate.
      http://distill.pub tackles a piece of this but not in a way that seems especially scalable.
      Scott Alexander’s short story Ars Longa Vita Brevis is a fictional account of what seems necessary to me.
      I do hope that by the end of this year the LW team will have made some concrete progress on this. I think it is plausibly a mistake that we haven’t focused on it already – we discussed switching gears towards it at our last retreat but it seemed to make more sense to finish Open Questions.
      - Benquo 30 May 2019 4:08 UTC
        16 points
        Parent
        Trying to nudge others seems like an attempt to route around the problem rather than solve it. It seems like you tried pretty hard to integrate the substantive points in my “Effective Altruism is self-recommending” post, and even with pretty extensive active engagement, your estimate is that you only retained a very superficial summary. I don’t see how any compression tech for communication at scale can compete with what an engaged reader like you should be able to do for themselves while taking that kind of initiative.
        We know this problem has been solved in the past in some domains—you can’t do a thing like the Apollo project or build working hospitals where cardiovascular surgery is regularly successful based on a series of atomic five-word commands; some sort of recursive general grammar is required, and at least some of the participants need to share detailed models.
        One way this could be compatible with your observation is that people have somewhat recently gotten worse at this sort of skill; another is that credit-assignment is an unusually difficult domain to do this in. My recent blog posts have argued that at least the latter is true.
        In the former case (lost literacy), we should be able to reconstruct older modes of coordination. In the latter (politics has always been hard to think clearly about), we should at least internally be able to learn from each other by learning to apply cognitive architectures we use in domains where we find this sort of thing comparatively easy.
        Raemon 1 Jun 2019 3:43 UTC
        4 points
        Parent
        I think I may have communicatedly somewhat poorly by phrasing this in terms of 5 words, rather than 5 chunks, and will try to write a new post sometime that presents a more formal theory of what’s going on.
        I mentioned in the comments of the previous post:
        Coordinated actions can’t take up more bandwidth than someone’s working memory (which is something like 7 chunks, and if you’re using all 7 chunks then they don’t have any spare chunks to handle weird edge cases).
        A lot of coordination (and communication) is about reducing the chunk-size of actions. This is why jargon is useful, habits and training are useful (as well as checklists and forms and bureaucracy), since that can condense an otherwise unworkably long instruction into something people can manage.
        And:
        The “Go to the store” is four words. But “go” actually means “stand up. walk to the door. open the door. Walk to your car. Open your car door. Get inside. Take the key out of your pocket. Put the key in the ignition slot...” etc. (Which are in turn actually broken into smaller steps like “lift your front leg up while adjusting your weight forward”)
        But, you are capable of taking all of that an chunking it as the concept “go somewhere” (as as well as the meta concept of “go to the place whichever way is most convenient, which might be walking or biking or taking a bus”), although if you have to use a form of transport you are less familiar with, remembering how to do it might take up a lot of working memory slots, leaving you liable to forget other parts of your plan.
        I do in fact expect that the Apollo project worked via finding ways to cache things into manageable chunks, even for the people who kept the whole project in their head.
        Chunks can be nested, and chunks can include subtle neural-network-weights that are part of your background experience and aren’t quite explicit knowledge. It can be very hard to communicate subtle nuances as part of the chunks if you don’t have excess to high volume and preferably in-person communication.
        I’d be interested in figuring out how to operationalize this as a bet and check how the project actually worked. What I have heard (epistemic status: heard it from some guy on the internet) is that actually, most people on the project did not have all the pieces in their head, and the only people who did were the pilots.
        My guess is that the pilots had a model of how to *use* and *repair* all the pieces of the ship, but couldn’t have built it themselves.
        My guess it that “the people who actually designed and assembled the thing” had a model of how all the pieces fit together, but not as a deep a model of how and when to use it, and may have only understood the inputs and outputs of each piece.
        And meanwhile, while I’m not quite sure how to operationalize the bet, I would bet maybe $50 that (conditional on us finding a good operationalization), that the number of people who had the full model or anything like it was quite small. (“You Have About Five Words” doesn’t claim you can’t have more than 5 words of nuance, it claims that you can’t coordinate large groups of people that depend on more than 5 words of nuance. I bet there were less than 100 people and probably closer to 10 who had anything like a full model of everything going on)
        Benquo 3 Jun 2019 1:36 UTC
        6 points
        Parent
        and will try to write a new post sometime that presents a more formal theory of what’s going on
        I think I’m unclear on how this constrains anticipations, and in particular it seems like there’s substantial ambiguity as to what claim you’re making, such that it could be any of these:
        You can’t communicate recursive structures or models with more than five total chunks via mass media such as writing.
        You can’t get humans to act (or in particular to take initiative) based on such models, so you’re limited to direct commands when coordinating actions.
        There exist such people, but they’re very few and stretched between very different projects and there’s nothing we can do about that.
        ??? Something else ???
        Raemon 3 Jun 2019 3:55 UTC
        4 points
        Parent
        I think there are two different anticipation-constraining-claims, similar but not quite what you said there:
        Working Memory Learning Hypothesis – people can learn complex or recursive concepts, but each chunk that they learn cannot be composed of more than 7 other chunks. You can learn a 49 chunk concept but first must distill it into seven 7-chunk-concepts, learn each one, and then combine them together.
        Coordination Nuance Hypothesis – there are limits to how nuanced a model you can coordinate around, at various scales of coordination. I’m not sure precisely what the limits are, but it seems quite clear that the more people you are coordinating the harder it is to get them to share a nuanced model or strategy. It’s easier to have a nuanced strategy with 10 people than 100, 1000, or 10,000.
        I’m less confident of the Working Memory hypothesis (it’s an armchair inside view based on my understanding of how working memory works)
        I’m fairly confident in the Coordination Nuance Hypothesis, which is based on observations about how people actually seem to coordinate at various scales and how much nuance they seem to preserve.
        In both cases, there are tools available to improve your ability to learn (as an individual), disseminate information (as a communicator), and keep people organized (as a leader). But none of the tools changed the fundamental equation, just the terms.
        Anticipation Constraints:
        The anticipation-constraint of the WMLH is “if you try to learn a concept that requires more than 7 chunks, you will fail. If a concept requires 12 chunks, you will not successfully learn it (or will learn a simplified bastardization of it) until you find a way to compress the 12 chunks into 7. If you have to do this yourself, it will take longer than if an educator has optimized it for you in advance.”
        The anticipation constraint of the CNH is that if you try to coordinate with 100 people of a given level of intelligence, the shared complexity of the plan that you are enacting will be lower than the complexity of the plan you could enact with 10 people. If you try to implement a more complex plan or orient around a more complex model, your organization will make mistakes due to distorted simplifications of the plan. And this gets worse as your organizations scales.
        Benquo 3 Jun 2019 5:44 UTC
        2 points
        Parent
        CNH is still ambiguous between “nuanced plan” and “nuanced model” here, and those seem extremely different to me.
        Raemon 3 Jun 2019 22:31 UTC
        2 points
        Parent
        I agree they are different but think it is the case that with a larger group you have a harder time with either of them, for roughly the same reasons at roughly the same rate of increased difficulty.
        Raemon 3 Jun 2019 4:06 UTC
        2 points
        Parent
        The Working Memory Hypothesis says the Bell Labs is useful, in part, because whenever you need to combine multiple interdisciplinary concepts that are each complicated to invent a new concept…
        instead of having to read a textbook that explains it one-particular-way (and, if it’s not your field, you’d need to get up to speed on the entire field in order to have any context at all) you can just walk down the hall and ask the guy who invented the concept “how does this work” and have them explain it to you multiple times until they find a way to compress it down into a 7 chunks, optimized for your current level of understanding.
        Raemon 3 Jun 2019 4:03 UTC
        2 points
        Parent
        A slightly more accurate anticipation of the CNH is:
        people need to spend time learning a thing in order to coordinate around it. At the very least, the more time you need to spend getting people up to speed on a model, the less time they have to actually act on that model
        people have idiosyncratic learning styles, and are going to misinterpret some bits of your plan, and you won’t know in advance which ones. Dealing with this requires individual attention, noticing their mistakes and correcting them. Middle managers (and middle “educators” can help to alleviate this, but every link in the chain reduces your control over what message gets distributed. If you need 10,000 people to all understand and act on the same plan/model, it needs to be simple or robust enough to survive 10,000 people misinterpreting it in slightly different ways
        This gets even worse if you need to change your plan over time in response to new information, since now people are getting it confused with the old plan, or they don’t agree with the new plan because they signed up for the old plan, and then you have to Do Politics to get them on board with the new plan.
        At the very least, if you’ve coordinated perfectly, each time you change your plan you need to shift from “focusing on execution” to “focusing on getting people up to speed on the new model.”
  - Zack_M_Davis 30 May 2019 4:43 UTC
    10 points
    Parent
    
    when I tried to remember what it was about, all I could remember [...] I’m similarly worried that a year from now
    
    Make spaced repetition cards?
    - Raemon 1 Jun 2019 3:31 UTC
      4 points
      Parent
      The way that I’d actually do this, and plan to do this (in line with Benquo’s reply to you), is to repackage the concept into something that I understand more deeply and which I expect to unpack more easily in the future.
      Part of this requires me to do some work for myself (no amount of good authorship can replace putting at least some work into truly understanding something)
      Part of this has to do with me having my own framework (rooted in Robust Agency among other things) which is different from Benquo’s framework, and Ben’s personal experience playing werewolf.
      But a lot of my criticism of the current frame is that it naturally suggest compacting the model in the wrong way. (to be clear, I think this is fine for a post that represents a low-friction strategy to post your thoughts and conversations as they form, without stressing too much about optimizing pedagogy. I’m glad Ben posted the Villager/Werewolf post. But I think the presentation makes it harder to learn than it needs to be, and is particularly ripe for being misinterpreted in a way that benefits rather than harms werewolves, and if it’s going to be coming up in conversation a lot I think it’d be worth investing time in optimizing it better)
    - Benquo 30 May 2019 13:06 UTC
      2 points
      Parent
      That seems like the sort of hack that lets you pass a test, not the sort of thing that makes knowledge truly a part of you. To achieve the latter, you have to bump it up against your anticipations, and constantly check to see not only whether the argument makes sense to you, but whether you understand it well enough to generate it in novel cases that don’t look like the one you’re currently concerned with.
      - Zack_M_Davis 30 May 2019 15:00 UTC
        6 points
        Parent
        I think it’s possible to use in a “mindful” way even if most people are doing it wrong? The system reminding you what you read n days ago gives you a chance to connect it to the real world today when you otherwise would have forgotten.