[Question] How Can People Evaluate Complex Questions Consistently?

Elizabeth26 Aug 2019 20:33 UTC

46 points

Rationality World Modeling Scholarship & Learning

I’m doing a project on how humans can evaluate messy problems and come up with consistent answers (consistent with both themselves over time and with other people), and what the trade off is with accuracy. This isn’t a single unified field, so I need to go poking for bits of it in lots of places. Where would people suggest I look? I’m especially interested in information on consistently evaluating novel questions that don’t have enough data to make statistical models (“When will [country] develop the nuclear bomb?”) as opposed to questions for which we have enough data that we’re pretty much just looking for similarities (“Does this biopsy reveal cancer?”).

An incomplete list of places I have looked or plan on looking at:

interrater reliability
test-retest reliability
educational rubrics (for both student and teacher evaluations)
medical decision making/standard of care
Daniel Kahneman’s work
Philip Tetlock’s work
The Handbook of Inter-Rater Reliability

What links here?

Shallow Review of Consistency in Statement Evaluation by Elizabeth (9 Sep 2019 23:21 UTC; 65 points)

Elizabeth26 Aug 2019 20:33 UTC

46 points

12 comments1 min readLW link

Rationality World Modeling Scholarship & Learning

Elizabeth 9 Sep 2019 23:22 UTC
6 points
My report based on 15 hours of research on this topic.
Glenn Horn 9 Sep 2019 22:36 UTC
2 points
I recommend Richard Feynman’s writings. Read about his role in solving the messy problem of the Challenger Disaster. Feynman did not trust complex methods with hidden assumptions Instead he started with basic physics and science and derived hypotheses and conclusions.
A second scientists whose method I think is under-appreciated is Darwin. His simple methods came to conclusions from very messy data. He is noted for exactness and attention to details and creating the science of evolution.
LOL—“When will [country] develop the nuclear bomb?”—I would ask the CIA on this question. One technique they would use is to monitor what type and quantity of equipment and raw materials they are importing. Of course they(CIA) monitor testing of underground explosives.
I recommend Tim Harford “The Logic of Life” “The Underground Economists” He illustrated that many questions can be answered by investigating it from unexpected directions. His speciality is economics.
Two of his questions: Why do some neighborhoods thrive and others become ghettos? Why is racism so persistent? Why is your idiot boss paid a fortune for sitting behind a mahogany altar?
If you take Tim Harford as an example then look for other people that think outside of the box and answer tough question.
I would not trust very deep and complex methods that cannot be explained in detail at a level the user can understand. A user should gain experience with any system or method before trusting it.
I enjoyed you question—hope this helps.
- Kinrany 21 Oct 2019 17:10 UTC
  3 points
  Parent
  Typo: it’s The Undercover Economist
- Glenn Horn 10 Sep 2019 21:10 UTC
  1 point
  Parent
  I think your question is unclear because the question says How can people answer a question consistently? But it appears instead of learning how to answer messy questions yourself you want to know why others do not give consistent answers. Given diverse input information and different backgrounds I am not surprised that the answers vary. This page is a good example of the variability of answers.
  A better plan is to understand the methods and reasoning used by the best experts in thinking have solved problems and answered questions. With this research you can develop criteria and methodology that lead to accurate solutions and consistent answers.

Ruby 13 Sep 2019 22:00 UTC
6 points
Posting some thoughts I wrote up when first engaging with the question for 10-15 minutes.

The questions is phrased as: How Can People Evaluate Complex Questions Consistently? I will be reading a moderate amount into this exact phrasing. Specifically that it’s specifying a project whose primary aim is increasing the consistency of answers to questions.

The projects strikes me as misguided. It seems to me definitely the case that consistency is an indicator of accuracy because if your “process” is reliably picking out a fixed situation in the world, then this process will give roughly the same answers as applied over time. Conversely, if I keep getting back disparate answers, then likely whatever answering process is being executed isn’t picking up a consistent feature of reality.

Examples:
1) I have a measuring tape and I measure my height. Each time I measure myself, my answer falls within a centimeter range. Likely I’m measuring a real thing in the world with a process that reliably detects it. We know how my brain and my height get entangled, etc.
2) We ask different philosophers about the ethics of euthanasia. They give widely varying answers for widely varying reasons. We might grant that there exists on true answer here, but that the philosophers are not all using reliably processes for accessing that true answer. Perhaps some are, but clearly not all are since they’re not converging, which makes it hard to trust any of them.

Under my picture, it really is accuracy that we care about almost all of the time. Consistency/precision is an indicator of accuracy, and lack of consistency is suggestive of lack of accuracy. If you are well entangled with a fixed thing, you should get a fixed answer. Yet, having a fixed answer is not sufficient to guarantee that you are entangled with the fixed thing of interest. (“Thing” is very broad here and includes abstract things like the output of some fixed computation, e.g. morality.)

The real solution/question then is how to increase accuracy, i.e. how to increase your entanglement with reality. Trying to increase consistency separate from accuracy (even at the expense of!) is mixing up an indicator and symptom with the thing which really matters: whether you’re actually determining how reality is.

It does seem we want a consistent process for sentencing and maybe pricing (but that’s not so much about truth as it is about “fairness” and planning where we fear that differing amounts (sentence duration) is not sourced in legitimate differences between cases. But even this could be cast in the light of accuracy too: suppose there is some “true, correct, fair” sentence for a given crime, then we want a process that actually gets that answer. If the process actually works, it will consistently return that answer which is a fixed aspect of the world. Or we might just choose the thing we want to be entangled with (our system of laws) to be a more predictable/easily-computable one.

I’ve gotten a little rambly, so let me focus again. I think consistency and precision are important indicators to pay attention to when assessing truth-seeking processes, but when it comes to making improvements, the question should be “how do I increase accuracy/entanglement?” not “how do increase consistency?” Increasing accuracy is the legitimate method by which you increase consistency. Attempting to increase consistency rather than accuracy is likely a recipe for making accuracy worse because you’re now focusing on the wrong thing.
- Ruby 13 Sep 2019 22:04 UTC
  8 points
  Parent
  jimrandomh correctly pointed out to be that precision can have it’s own value for various kinds of comparison. I think he’s right. If A and B are each biased estimators of ‘a’ and ‘b’ but their bias is consistent (causing lower accuracy) but their precision is high, then I can make comparisons between A/a and B/b over time and between each other in ways I can’t even if the estimators were less biased but higher noise.
  Still though it’s here is that the estimator is tracking a real fixed thing.
  If I were to try to improve my estimator, I’d look at the process as a whole it implements and try to improve that rather than just trying to make the answer come out the same.
romeostevensit 28 Aug 2019 16:45 UTC
5 points
Cambridge handbook on expertise probably has some useful info.
zulupineapple 27 Aug 2019 10:20 UTC
2 points
What’s the motivation? In what case is lower accuracy for higher consistency a reasonable trade off? Especially consistency over time sounds like something that would discourage updating on new evidence.
- ozziegooen 2 Sep 2019 10:50 UTC
  3 points
  Parent
  I attempted to summarize some of the motivation for this here: https://www.lesswrong.com/posts/Df2uFGKtLWR7jDr5w/?commentId=tdbfBQ6xFRc7j8nBE
- Elizabeth 31 Aug 2019 0:55 UTC
  3 points
  Parent
  Some examples are where people care more about fairness, such as criminal sentencing and enterprise software pricing.
  However you’re right that implicit in the question was “without new information appearing”, although you’d want the answer to update the same way every time the same new information appeared.
- ChristianKl 27 Aug 2019 12:49 UTC
  3 points
  Parent
  If every study on depression would use it’s own metric for depression that’s optimal for the specific study it would be hard to learn from the studies and aggregate information from them. It’s much better when you have a metric that has consistency.
  Consistent measurements allow reacting to how a metric changes over time which is often very useful for evaluating interventions.
  - zulupineapple 27 Aug 2019 13:36 UTC
    1 point
    Parent
    This is true, but it doesn’t fit well with the given example of “When will [country] develop the nuclear bomb?”. The problem isn’t that people can’t agree what “nuclear bomb” means or who already has them. The problem is that people are working from different priors and extrapolating them in different ways.