The Practice & Virtue of Discernment

ozziegooen26 May 2021 0:34 UTC

41 points

QURI Virtues Rationality Planning & Decision-Making General Semantics Distinctions

Epistemic status: I think this definition might be useful, but it might also be too abstract, inelegant, or obvious.

Scholarship status: This is based mostly on background knowledge and some work in decision making. It touches topics in data science, General Semantics, and a lot of LessWrong. I’m not an expert in any of these topics.

Many thanks to David Manheim, Nuño Sempere, and Marta Krzeminska for comments, edits, and suggestions for this post.

Key Points

You have an generalization. You split up this generalization and wind up with a structure that’s easier to work with. This is called discernment.
I think discernment is very important, but often overlooked and underappreciated.
I suggest thinking of discernment through the lens of decision analysis. The expected value of information of working on a problem goes up once good discernment is done.
This post includes a long list of overgeneralizations and ways to apply discernment to them. Find the ones that best appeal to you.
We can split up the types of discernment into a bunch of distinct buckets. If you really care, there’s some clarification around where exactly the distinctions lie between what is and isn’t discernment.
Bad discernment is dangerous. Learn and practice safe discernment, avoiding harming yourself and others.
This might all seem obvious. It wasn’t completely obvious to me, especially before I wrote it. If it’s obvious to you, hopefully this post can still be useful for creating common knowledge. If you talk about discernment, you can point people to this post, which has possibly done more work than necessary to pin it down.
If you like the idea of rationality virtues, I suggest discernment as a virtue. If you don’t, I suggest thinking of discernment as an affordance.

Motivation and Definition

Decision making and debate on overgeneral topics seems like an obvious failure mode, but they happen all the time. Take some of the topic titles from the Oxford University Student Union Debates:

Snowden is a Hero
The United States is Institutionally Racist
Islam is Not a Peaceful Religion
We Should NOT Have Confidence in Modi’s Government
Thatcher was not good for Britain
We should NOT reject traditional masculinity
Immigration is Bad for Britain

These questions seem intentionally crafted to cause intellectual chaos. Let’s take, “Thatcher was not good for Britain.” A better discussion could be something like, “Let’s break down Thatcher’s work into several clusters, then select the clusters that are relevant to the decisions we are making today, and figure out what we can learn from each cluster individually.”

Or take, “Immigration is Bad for Britain”. There are several types of immigration, and several relevant groups (including potential future groups) with different interests both within and outside of Britain. It would take a very dogmatic viewpoint, or a premeditated agenda, to put all sorts of immigration in one bucket, or treat all groups in and around Britain as the same.

[Flag: I may well have misunderstood the Oxford Student Union Debates. See the comments of this post for more information here. I think the basic message holds (these sorts of mistakes are commonly done elsewhere), but this example of the Debates might be unfair.]

I’ve been looking into discussions of overgeneralization, and I’ve found two commonly recommended solutions:

Use words like “somewhat” a lot.
Use more precise statements.

I find 1 underwhelming, but 2 is often not immediately possible. Figuring out a good way to do these breakdowns is hard. It takes a deep domain understanding combined with a lot of trial and error. Let’s call this breakdown work discernment.

Discernment, as I describe it, has two main qualities:
A generalization is broken down somehow.
The new breakdown is intended to help with decision making.

Another way of expressing this second point is, “research spent using the new breakdown is more productive than research using the generalization.”[1]

It’s easy to forget the option of discernment, and instead spend too much effort attacking concepts that really should be narrowed down. If I were to ask, “How likely is it that this next President will be good for the United States?”, I’d expect to witness a lot of discussion on that topic as stated, instead of narrowing down what it should mean to be “good for the United States” and treating this as a distinct but relevant question. So I hope to help draw some attention to discernment and promote it as an affordance. Any time you evaluate a question, you should keep in mind that you’re allowed to answer not with an analysis or estimate, but with a discernment.

The Twelve Virtues of Rationality discusses a related term, precision:

The tenth virtue is precision. One comes and says: The quantity is between 1 and 100. Another says: The quantity is between 40 and 50. If the quantity is 42 they are both correct, but the second prediction was more useful and exposed itself to a stricter test. What is true of one apple may not be true of another apple; thus more can be said about a single apple than about all the apples in the world.

One way to think of discernment is to see it as the practice of modifying generalizations to enable statements to be precise. Discernment assumes that precision will follow (it needs to be actually used). Precision requires competent discernment to be possible and beneficial. Imagine trying to make precise statements on the debate topics mentioned above without some sort of breakdown.

The LessWrong Virtues tag includes all of the Twelve Virtues, along with several others. I suggest discernment as another candidate.

Examples

Here’s a list of a bunch of potential candidates for discernment. It’s sort of long, feel free to skim for the items that best match your interests.

Overgeneralization	Discernment Prompt
Is religion good or bad?	Let’s figure out how to break apart different aspects of religion, understand the benefits and drawbacks of each, and consider how valuable the individual aspects seem to be. Let’s try to understand if it’s possible to adopt the most promising parts and discourage the negative ones.
Is technology going to be amazing or terrible? (technophiles and luddites)	Which particular technology trends are occurring? What concrete things might they cause? Which of these seem positive, and which negative? For which groups of people?
Is functional programming or object oriented programming better?	What concretely differs between these approaches? How do these differences change what can be done easily? What are useful categories of problems that can best be solved using specific types of functional and object oriented programming?
How can we judge how promising charities are?	Can we subdivide the key decision-relevant aspects of charities into components that can be tackled in isolation?
How promising is global warming research?	How can we subdivide global warming research into clusters? What does each type of global warming research accomplish? What would tell us if a cluster that is highly promising?
Will there be a nuclear attack in the next 30 years?	What are the different particular types of nuclear and related attacks that we should think about? How could they occur? What actors are relevant to each? Can we assess the probability of each? In what ways can we use outside views? Are the risks correlated?
How can we increase justice?	Do the many things that people call justice fall into clusters? If so, what are these clusters and what does the corresponding network look like? Do these different clusters imply different takes on increasing justice?
How good are management consultants?	What do management consultants do? What particular kinds of management consultants are useful in which circumstances? What are the main risks to watch out for in each situation?
Should we defer trust to experts, smart people, or ourselves? (epistemic modesty)	Can we make a proxy metric that would help tell us in which situations we should defer to which authorities? How should we operate with uncertainty across which group is correct? What allows us to update our estimate in each case?
What is the utility function of this person X?	Does this person seem to act coherently? What would explain observed behavior? Can we separate the sorts of decision relevant utility functions pertaining to person X in a meaningful way? For instance, we can vary the amount of enlightenment.
What is the optimal chair shape for humans?	What goals are served by chairs? What are the key axes that we can use to make several different chairs, or custom chairs, to serve those different goals? (For example, maybe chair height and aesthetic genre.)
Which intellectuals are the most competent?	What qualifies as competence in the domain of interest? (Generating plausible models? Predictive accuracy? Outcomes?) How can we effectively and selectively extract information from particular intellectuals? Are there rankings or clusters of intellectual attributes that we can use to carefully decide what to learn from different ones?

What discernment is and is not

Here’s a list of some things that could count as discernment. This list is not complete. There are some very similar lists of the “tools that data scientists use”, so I suggest checking out those for more and perhaps better ideas.

Things that are discernment

Decomposition
This involves breaking something down into “fundamental” parts that can be neatly combined using mathematical or logic. The results of decompositions are generally complete and mutually exclusive. For example, there are several decompositions of proper scoring rules, and the “Importance, Tractability, and Neglectedness” framework has turned into a composable equation.
Clusters / cluster analysis
This involves separating out a set into (often messy) subgroups in a way that is useful for decision making. The ideal result would be a representation that shows up clearly in a hypothetical cluster analysis, if a real cluster analysis can’t be performed. For example, splitting up programming use cases into categories that can be dealt with distinctly, or performing a Marie Kondo style cleaning process in your home. Clusters are usually not totally mutually exclusive or complete.
Identifying an existing preferable subdivision
Sometimes there’s already a preferable subdivision to focus on, but it’s not immediately evident. For example, one might be deciding what genre of music to listen to, but later realize they should be making decisions on the per-artist level instead. Artists represent an already well established unit (though close enthusiasts would notice that even these can be surprisingly messy). However, it could have not been obvious if one should focus on a more broad category system (music vs. video vs. sport) or a more narrow one (subgenre, record, song, song fragment). Getting the scale right is a challenge, a lot like adjusting a camera’s focus.
Ordering
Sometimes a close investigation of a topic doesn’t produce a set of distinct clusters, but rather a linear ordering. For example, you investigate 30 already-defined types of advertising, and evaluate the expected impact of each. The discernment here is in the identification of the ranking system and its use for decision making. It might have been previously assumed that advertising was “one thing”, and was previously evaluated as a whole without breaking it apart like this. I call this an ordering and not a ranking, because there could be circumstances where both sides of the ordering have distinct uses.
Bifurcation (“Just the good parts”)
Instead of doing a full ordering, you can just give each item a binary value. This represents something like “good” vs. “bad”. See Javascript: The Good Parts for an example. My idealized professors in most topics start their first lecture with: “I get there’s a lot of bad stuff here, I empathize with you. But underneath there’s some exciting work, and I’ve organized this subsection for you.”
Bucket error solving
This means “making it clear that one idea is clearly a set of other things with little in common.” Bucket error solving is a combination of clustering and dissolving. The resulting division might reveal there is nothing valuable unifying the subconcepts. In such cases, the greater concept should be extinguished.
Note: This is also called equivocation, or all the many things described in abramdemski’s post here.
Dissolving
This means “making it clear that something that people thought was a thing clearly isn’t.” This applies to terms that are confused. Sometimes words rely on presuppositions that are false or represent mistaken abstractions that don’t serve a purpose after fundamental questions are resolved.

Things that are not discernment

Decision-arbitrary categorization
It’s common to need to subdivide something in some way to make it easier to work with. There are cases where the choice of implementation doesn’t matter very much. Take for example, most uses of alphabetic and numerical categorization. It’s helpful to organize books in libraries alphabetically, but this feels distinct from discernment work. We could describe this sort of work as organization or categorization. Card sorting, for instance, is used to help identify clusters that people find generally intuitive, rather than ones that come from deep domain insight. It’s still useful, just not the same as discernment.
Gears Level Models
Say you want to understand something functionally.You need to explore its parts in detail, but your intention is to use the pieces to make functional models (think gears-level models). This exploration would be fairly removed from the traditional decision-making structures. Gears level models care a lot about the interactions between different parts, and this isn’t the focus for discernment.
Redefinitions
Sometimes definitions don’t need to be turned into subcategories, it’s enough to modify them a little. This requires the main skills of discernment, but I’m hesitant to expand the scope of my definition to incorporate redefinitions into it. For one, I would expect that “designing subcategories that are useful” to be more common and valuable than “redefining one word in isolation”. Please comment if you disagree, I’m not sure here.

Pitfalls

Poorly executed discernment is a common source of tedium and suffering. Discernment requires upfront and continuous costs. Readers need to be educated, occasionally re-educated (when words are removed or modified), and the resulting complexity must be remembered. There are also many possible types of discernment that could wind up causing information value to decrease in expectation.

You could identify subcategories that imply false things. Perhaps you identify a sort of ordering of businesses suited for one very particular purpose, but later other people start using it for very different purposes. Maybe it’s goodharted.

The book Sorting Things Out: Classification and its Consequences goes into detail in how classification can go poorly.

There’s also the problem that sometimes abstractions are already ideal for decision making, and any subdivisions would make things worse. Maybe you really are making a decision on religion as a whole and have a limited time to do so.

Asides:

Relevance to forecasting systems

I think some people assume that prediction markets and forecasting tournaments will give increasingly accurate probabilities on predefined questions. My take is that I expect that for reasoning, and especially collective reasoning, to achieve dramatic gains, a lot of discernment will be required. A great forecasting system isn’t one that tells you, “The chances of a nuclear war in the next 10 years is 23%”, but rather one that tells you, “The most tractable type of nuclear war to resist are rogue terrorist threats by one of these three known organizations. We have organized a table of the most effective means of stopping such threats.”

Relevance to learning

Quality discernment of research materials is a great asset to have. A well discerning researcher can isolate the particular parts of both good works and bad ones worth paying attention to. They can find the useful bits of even the most juvenile, evil, or boring fields, and not be worried about wasting time or absorbing bad influences. Discerning people sometimes have deep affections for things that others despise, because there’s sometimes a lot of quality buried within otherwise bad things. (For example, “bad” movies with cult followings). The trick is to be appreciative and curious without being attached. Maybe check out the literature of decoupling vs. contextualizing for more information here.

Value of information

I’ve really been meaning to write this up more formally, but until then, here’s one tidbit I think is relevant here and more generally useful.

Categorization itself can present clear expected value. It’s fairly straightforward to demonstrate that an agent using simple ontology A would have higher total expected value than one using a simple ontology B under some assumptions. As a thought experiment, say an agent can collect coins of different types, but has limited time to do so. If they have a poor categorization system of coins, they are likely to make poor decisions about which ones to go after. For example, it might be crucial that the agent pay attention to the color of the coin (gold vs. silver) instead of the letters on it. This almost exactly mirrors the problem (and value) of feature selection in data science.

If you’re reading this and know of literature that estimates the value of ontologies using terms similar to value of information calculations, please let me know.

Why emphasize discernment?

I’m uncertain whether discernment as defined is actually a good category to draw attention to. This definition excludes many similar practices.

Refactoring categories (redrawing boundaries, for example)
Redefinitions
Focusing on the more general, instead of the less general

I think these practices are much less common than discernment. This is for a few reasons. People tend to begin with highly general generalizations, so it makes sense that a lot of work would go towards making them more narrow. Refactoring and redefinitions are difficult to promote and encourage. I think it’s much easier to suggest new terms than change existing ones, and discernment prioritizes this work.

So, I expect discernment to be easier and have more potential than the other aspects of categorization, for the main use cases I can think of now.

LessWrong connections

Many LessWrong articles use discernment to suggest new distinctions and subcategories. Several posts discuss the benefits and drawbacks of some sorts of discernments. I haven’t seen popularized terms for doing this discernment work itself, so I hope this term can fit in well.

The related LessWrong material that I’ve read focuses on situations where bad or overlooked distinctions create clear epistemic biases and pitfalls. I’m more interested in situations where breakdowns increase productivity, often by incremental amounts. However, I expect all types are valuable. I’m unsure how the main benefits are distributed.

Here are some relevant tags on LessWrong, in rough order of relevance.

Future Work

Some obvious steps for future work on this topic would be:

Making a big list of real examples of discernment in different fields.
Use the above list to make a better breakdown of the types of discernment and the safety and value of each one.
Better tie the definitions above to those in data science or statistical learning or similar.
I’m sure the math could be taken further and done much better.
Make models of the expected value of making estimates using different sorts of ontologies.

[1] I added the phrase “intended to” to leave the opportunity to discuss effective vs ineffective discernments. I imagine it would be confusing to need some other name for something just like a discernment, but not actually useful. I’m sure that many attempts at effective discernment are useless or harmful.

What links here?

ozziegooen26 May 2021 0:34 UTC

41 points

11 comments11 min readLW link

QURI Virtues Rationality Planning & Decision-Making General Semantics Distinctions

AnnaSalamon 26 May 2021 14:20 UTC
11 points
Thanks. I just bounced to LW after getting stuck in a tricky bit of writing, and found this helpful for where I was stuck.

I think the main things I found helpful from your post just now were:
1. the examples, which for me recalled the habit of righting a wrong question; and
2. the explicit suggestion that I could take the spirit of “righting a wrong question”, or “dissolving a question”—call it a “virtue”—and steer toward it in the way I might steer toward curiosity or other virtues.
- ozziegooen 27 May 2021 4:38 UTC
  5 points
  Parent
  Thanks so much for being specific about the benefits you got from it, that’s really useful.
Adam Bricknell 26 May 2021 9:08 UTC
7 points
This reminds me of the Ladder of Abstraction in General Semantics: moving up the levels of abstraction to talk about things in general, and moving down to clarify the details that make up each generalisation
- ozziegooen 27 May 2021 4:43 UTC
  6 points
  Parent
  Good point, thanks. Much of this was inspired by lectures and high-level reading I’ve witnessed around General Semantics. By chance do you recommend any resources in the area? I’ve found the area somewhat difficult to penetrate.
  - Adam Bricknell 27 May 2021 13:22 UTC
    3 points
    Parent
    Agreed on GS being on the opaque side!
    Best thing I’ve read is Language in Thought and Action by Hayakawa. I found it explained the concepts much more clearly than Koryzbski’s writings.
    Drive Yourself Sane is meant to be an intro to GS and has good reviews. I can’t remember much of it as read it >7 years ago, however I recall it didn’t do as much for me as Hayakawa’s book
Richard_Kennaway 27 May 2021 7:28 UTC
5 points

These questions seem intentionally crafted to cause intellectual chaos.

They are. That is what Oxford Union debates are for. (Not Oxford University Student Union, which is a different organisation.) They are a platform for buffoonery, strutting one’s intellectual stuff, flamboyant point-scoring, lively tackling of hecklers, arguing any side of any question independently of one’s own beliefs, and auditioning for a career in Parliament. Boris Johnson is a past president of the OU, a position which fitted him perfectly, and you can hear it in him every time you tune in to Prime Minister’s Questions.
ChristianKl 26 May 2021 14:06 UTC
4 points
These questions seem intentionally crafted to cause intellectual chaos.
That seems to misunderstand the rules of BPS debating. It’s completley okay for the headline to be broad because it’s the job of the person who first speaks to give more specifity.
It seems to me like all those topics are more specific then the topic of your post.
Let’s break down Thatcher’s work into several clusters, then select the clusters that are relevant to the decisions we are making today, and figure out what we can learn from each cluster individually
This is obviously a pretty poor debate title by most ideas about what titles are for.
- ozziegooen 27 May 2021 4:41 UTC
  5 points
  Parent
  Thanks for the clarification here. That said, I have watched a few of these debates (partially) on Youtube, and haven’t been very impressed by their abilities in practice to actually give much specificity.
  This is obviously a pretty poor debate title by most ideas about what titles are for.
  I don’t understand what you mean here. Personally I find many of these debates rather poor for real intellectual progress, and I think that a structured attempt at specificity could be an improvement (or at least a useful alternative), but I could definitely be wrong.
  - ChristianKl 27 May 2021 18:55 UTC
    4 points
    Parent
    Thanks for the clarification here. That said, I have watched a few of these debates (partially) on Youtube, and haven’t been very impressed by their abilities in practice to actually give much specificity.
    Watching debates partially if you don’t hear the first speaker, means that you miss the point where terms are defined.
    The first government (which means the 1st and 3rd speaker) has as Wikipedia calls it the semi-devine right of definition. Usually, the 1st speaker is supposed to define anything that’s unspecific. If the first speaker leaves anything important undefined, it’s the job of the 2nd speaker to point out unclarity in the terms so that the 3rd speaker can clear that up.
    In addition to defining the terms the 1st speaker usually also put forward a bunch of tests. That takes the form of “To show this motion is correct I will demostrate, A, B and C”.
    Let’s take the debate on toxic masculinity. The first speaker defines the terms by saying: “we are rejecting traditional masculinity because it’s toxic and it’s toxic because it’s powerful tradition is rarely strayed from and remains
    unchallenged, tradition is an unspoken rule that people are happy to observe, tradition is power, the point of an evolving society is that no construct can stay traditional forever some traditions you can nudge along others need to be dragged into the 21st century”.
    Those are a bunch of very specific claims that are supposed to define the terms of the debate:
    Being toxic means:
    It’s a powerful tradition are rarely strayed from and remains unchallenged.
    tradition is an unspoken rule that people are happy to observe
    Tradition is power
    In addition:
    No construct can stay traditional forever; some traditions you can nudge along others need to be dragged into the 21st century
    Then you get the tests “traditional masculinity is harmful to men, that it is the cause and the result of a power imbalance which continues to propagate and that the power playing field is being levelled out due to changing gender roles and controlling traditional masculinity unleashes tremendous benefits for society”
    Those are four here:
    traditional masculinity is harmful to men
    that it is the cause and the result of a power imbalance which continues to propagate
    that the power playing field is being levelled out due to changing gender roles
    controlling traditional masculinity unleashes tremendous benefits for society
    Basically 50 seconds are spent for specificity.
    You can argue about how well this particular speaker did her job of specifying the topic and that’s part of what the judge evaluates when it comes to the rankings at the end.
    Personally I find many of these debates rather poor for real intellectual progress, and I think that a structured attempt at specificity could be an improvement (or at least a useful alternative), but I could definitely be wrong.
    Basically, the claim here is that a “structured attempt at specificity” is better then one where it’s the responsibility of one person to provide specifity (and be graded on it).
    I think you are guilty here of what you are charging given that I don’t have any idea what the phrase “structured attempt at specificity” means and how it differs from the incentive based one of the BPS rules.
    You also conflate the question with specifity with the one about whether the debates are good for intellectual progress. We have classic LessWrong arguments that debates generally aren’t good for intellectual progress which seem to me more central then issues of specifity in debates under BPS rules (or slight variations to make the debate more fun for an audience).
    I don’t understand what you mean here.
    A title of a book, a title of a scientific paper or the title of a LessWrong post all are a lot less specific then the content of the posts. There’s no reason for that being any different for the topic of a debate.
    - ozziegooen 1 Jun 2021 1:02 UTC
      2 points
      Parent
      Thanks for the response.
      
      I added a flag to the debate section of this post to show that the example is contested. If I were to ever discuss this in further detail, I’d look into finding other examples.
      There’s a whole lot here, so I’ll try to address some of the points I might be able to help clarify.
      You also conflate the question with specifity with the one about whether the debates are good for intellectual progress.
      I’m less sold that debates are bad for intellectual progress than others on LessWrong. I definitely think that some debates are poor, but have hope that there could be ways to structure them a bit differently to make some types quite good. One thing debates are great as is for demonstrating multiple sides of an issue. Around EA/rationalism, sometimes it feels like there’s a lot of uniformity of opinion. I did debate in High School and College and found them quite interesting, though suboptimal.
      There’s no reason for that being any different for the topic of a debate.
      One of the reasons I wrote this post on LessWrong is because I’d like to see such precision being used more here (and in the EA sphere). I’m not saying I’m particularly good at it myself. I imagine it’s a skill that takes time to improve.
      I agree that many areas are lacking in precision, I just used debate as an example because it seemed particularly on the nose. Debate is definitely less relevant or important than those other areas, I don’t really care about it in particular.
      I think you are guilty here of what you are charging given that I don’t have any idea what the phrase “structured attempt at specificity” means and how it differs from the incentive based one of the BPS rules.
      I’m not trying to claim I’m great at discernment or precision. Part of why I investigated it was because I was interested in improving. Virtues are kind of meant to be aspired to. Sorry if this was confusing.
      - ChristianKl 1 Jun 2021 15:34 UTC
        2 points
        Parent
        I agree that many areas are lacking in precision, I just used debate as an example because it seemed particularly on the nose. Debate is definitely less relevant or important than those other areas, I don’t really care about it in particular.
        If I go on Metaculus I have a title of a metaculus question that’s fairly unspecific. Then I have a few paragraphs explaining the question and often a fine print that add additional precision about how the question gets resolved. Criticising the title for not being precise enough misses the point, given that precision is not the purpose of a title.
        When it comes to “debate” there are multiple different rule sets and some events that call themselves debates which don’t really have a rule set. If you argue that a debate should be structured differently, then the question is about whether it should have different rules.
        The Oxford debate union uses BPS rules (and it seems on the video’s that they have some additional factors that get the first speaker to introduce contestants in the debate for events with an audience like the one on the Youtube channel).
        One big problem with debates whether under BPS or APDA is that it doesn’t matter at all whether or not there’s empirical evidence for claims but it only matters whether or not claims seem sensible.
        They train people in a mindset that devalues science as a way to resolve uncertainty. As a result very accomplished and smart debaters believe stupid things that no rationalist would. There’s one example of a person who was amazing to me in the amount of debating skill and intelligence combined with conspiracy theory beliefs.
        For the record APDA has additional problem where it focuses on whether points that were raised were addressed or not and little on the quality which gets people to speak extremely fast because the faster a debater under APDA speaks the more points the can make and address. BPS cares at least about argument quality and not just quantity.
        Part of why I investigated it was because I was interested in improving.
        To the extend that’s true, how about making a proposal about how you think specificity should be brought into debates if you don’t like the way BPS rules do it?