Self-fulfilling correlations

Correlation does not imply causation. Sometimes corr(X,Y) means X=>Y; sometimes it means Y=>X; sometimes it means W=>X, W=>Y. And sometimes it’s an artifact of people’s beliefs about corr(X, Y). With intelligent agents, perceived causation causes correlation.

Volvos are believed by many people to be safe. Volvo has an excellent record of being concerned with safety; they introduced 3-point seat belts, crumple zones, laminated windshields, and safety cages, among other things. But how would you evaluate the claim that Volvos are safer than other cars?

Presumably, you’d look at the accident rate for Volvos compared to the accident rate for similar cars driven by a similar demographic, as reflected, for instance in insurance rates. (My google-fu did not find accident rates posted on the internet, but insurance rates don’t come out especially pro-Volvo.) But suppose the results showed that Volvos had only 34 as many accidents as similar cars driven by similar people. Would that prove Volvos are safer?

Perceived causation causes correlation

No. Besides having a reputation for safety, Volvos also have a reputation for being overpriced and ugly. Mostly people who are concerned about safety buy Volvos. Once the reputation exists, even if it’s not true, a cycle begins that feeds on itself: Cautious drivers buy Volvos, have fewer accidents, resulting in better statistics, leading more cautious drivers to buy Volvos.

Do Montessori schools or home-schooling result in better scores on standardized tests? I’d bet that they do. Again, my google-fu is not strong enough to find any actual reports on, say, average SAT-score increases for students in Montessori schools vs. public schools. But the largest observable factor determining student test scores, last I heard, is participation by the parents. Any new education method will show increases in student test scores if people believe it results in increases in student test scores, because only interested parents will sign up for that method. The crazier, more-expensive, and more-difficult the method is, the more improvement it should show; craziness should filter out less-committed parents.

Are vegetarian diets or yoga healthy for you? Does using the phone while driving increase accident rates? Yes, probably; but there is a self-fulfilling component in the data that is difficult to factor out.

Conditions under which this occurs

If you believe X helps you achieve Y, and so you use X when you are most-motivated to achieve Y and your motivation has some bearing on the outcome, you will observe a correlation between X and Y.

This won’t happen if your motivation or attitude has no bearing on the outcome (beyond your choice of X). If passengers prefer one airline based on their perception of its safety, that won’t make its safety record improve.

However, this is different from either confidence or the placebo effect. I’m not talking about the PUA mantra that “if you believe a pickup line will work, it will work”. And I’m not talking about feeling better when you take a pill that you think will help you feel better. This is a sample-selection bias. A person is more likely to choose X when they are motivated to achieve Y relative to other possible positive outcomes of X, and hence more inclined to make many other little trade-offs to achieve Y which will not be visible in the data set.

It’s also not the effect people are guarding against with double-blind experiments. That’s guarding against the experimenter favoring one method over another. This is, rather, an effect guarded against with random assignment to different groups.

Nor should it happen in cases where the outcome being studied is the only outcome people consider. If a Montessori school cost the same, and was just as convenient for the parents, as every other school, and all factors other than test score were equal, and Montessori schools were believed to increase test scores, then any parent who cared at all would choose the Montessori school. The filtering effect would vanish, and so would the portion of the test-score increase caused by it. Same story if one choice improves all the outcomes under consideration: Aluminum tennis racquets are better than wooden racquets in weight, sweet spot size, bounce, strength, air resistance, longevity, time between restrings, and cost. You need not suspect a self-fulfilling correlation.

It may be cancelled by a balancing effect, when you are more highly-motivated to achieve Y when you are less likely to achieve Y. In sports, if you wear your lucky undershirt only for tough games, you’ll find it appears to be unlucky, because you’re more likely to lose tough games. Another balancing effect is if your choice of X makes you feel so confident of attaining Y that you act less concerned about Y; an example is (IIRC) research showing that people wearing seat-belts are more likely to get into accidents.

Application to machine learning and smart people

Back in the late 1980s, neural networks were hot; and evaluations usually indicated that they outperformed other methods of classification. In the early 1990s, genetic algorithms were hot; and evaluations usually indicated that they outperformed other methods of classification. Today, support vector machines (SVMs) are hot; and evaluations usually indicate that they outperform other methods of classifications. Neural networks and genetic algorithms no longer outperform older methods. (I write this from memory, so you shouldn’t take it as gospel.)

There is a publication bias: When a new technology appears, publications indicating it performs well are interesting. Once it’s established, publications indicating it performs poorly are interesting. But there’s also a selection bias. People strongly motivated to make their systems work well on difficult problems are strongly motivated to try new techniques; and also to fiddle with the parameters until they work well.

Fads can create self-fulfilling correlations. If neural networks are hot, the smartest people tend to work on neural networks. When you compare their results to other results, it can be difficult to look at neural networks vs., say, logistic regression; and factor out the smartest people vs. pretty smart people effect.

(The attention of smart people is a proxy for effectiveness, which often misleads other smart people—e.g., the popularity of communism among academics in America in the 1930s. But that’s yet another separate issue.)