Blessed information, garbage information, cursed information

This post is also available on my substack. Thanks to Justis Mills for editing and feedback.

Imagine that you’re a devops engineer who has been tasked with solving an incident where a customer reports having bad performance. You can look through the logs of their server, but this raises the problem that there’s millions of lines of log, and likely only a few of them are relevant to the issue. Thus, the logs are basically “garbage information”.


Rather than looking at a giant pool of unfiltered information, what you really need is highly distilled information that’s specifically optimized for solving this performance issue. For instance you could ask the user for more information about precisely what they were doing, or use filters to get the logs for exactly the parts of the application they were dealing with, or look through the places where the server spent a very large amount of time. The more a piece of information has been made to help you, the more “blessed” it is, with the extreme end of blessedness being information that keeps surprising you in its usefulness.

It might be tempting to think you could use multivariate statistics like factor analysis to distill garbage information by identifying axes which give you unusually much information about the system. In my experience, that doesn’t work well, and if you think about it for a bit, it becomes clear why: if the garbage information has a 50 000 : 1 ratio of garbage : blessed, then finding an axis which explains 10 variables worth of information still leaves you with a 5 000 : 1 ratio of garbage : blessed. The distillation you get with such techniques is simply not strong enough.[1][2]

A 50 000 : 1 ratio might sound insurmountable by any technique, but because strong evidence is common, it’s actually pretty feasible; e.g. knowing which minute in a week an incident occurred already gets you about this strong of a filter.


While blessed information is actively helpful, and garbage information is essentially useless, there’s also the third case, of information that leads you down the wrong road. If an incident is labelled as “everything is slow”, then that may very well get it more highly prioritized through customer service, but if most things aren’t slow but the engineer investigates as if it were, that ends up burning more engineer time than if it was labelled accurately. Actively misleading information could be called “cursed information”.

Information doesn’t have to be literally false in order for it to mislead. Often, people use information to infer the presence of adjacent latent variables outside of the literal meaning of that information. For instance, “the website is slow to load” might be taken to mean “the server is slow”, which could be misleading if the real answer is “because I’m on a very slow network connection”.

Cursed information doesn’t just have the first-order harm caused by people believing it. It also has a second-order harm, as people develop filters so they don’t end up believing cursed information. One such filter is verifying all the information you are given, which is costly. Another such filter is just ignoring most of what you are told, which loses one of the most effective means of learning information.


Blessed information can be expensive to produce, and cursed information can be hard to destroy and disincentivize. So one cannot expect all information to be blessed, nor expect no information to be cursed. But if you are dealing with information, especially if you are spreading information, it may still be good to ask yourself: is this blessed, garbage or cursed? If the first, great! If the last, maybe reconsider what you are doing.

The distinction between blessed, garbage and cursed information is value-laden, because it depends on what you are trying to do. However, I find that there is relatively little ambiguity in practice in-the-moment, as one is trying to solve some specific task.

The distinction between whether something is blessed or cursed becomes unambiguous because there is a relatively small set of people involved who have any influence on the task, and these people tend to have relatively clearly defined roles. Even when we have conflicting interests, we are part of a shared project, and the organization(s) that own this project have an interest in aligning our interests with each other.

This is obvious in the corporate setting that the engineer works in. Each of the people involved has a relatively small set of tasks that are efficient to work on, and each task has a relatively small set of solutions that are cheap to achieve. Because these sets are small, there’s also commonly a small set of variables that contain essentially all the information relevant for solving those tasks, and due to noise, almost all other variables are irrelevant, i.e. garbage. Of course, the logs exist for a reason; we expect some of them to be non-garbage with respect to some future tasks.

But it is also true (or can be made true) in many other scenarios. For instance, in personal relationships, the relationship partners are the main people who get impacted and have influence, so there arises a notion of whether information is blessed and cursed with respect to said relationship. If there is a conflict, then either person can take initiative to resolve said conflict.

  1. ^

    With one important caveat: in such methods, it is common to induce scale invariance, for instance by dividing by the standard deviation before doing PCA, or using probability-based methods to fit the factor model. If you don’t introduce scale invariance, then the long-tailedness of the data will basically force the biggest things to dominate in the results. But for getting blessed information, that is Actually Good: it is equivalent to looking through the places where the server spent a lot of time. This kind of stops being multivariate, though, as then there’s essentially only one variable that ends up driving the results.

  2. ^

    Once you do have a ton of blessed information, it can be helpful to apply multivariate methods to it to find components of it that are even more blessed. It just doesn’t work on pure garbage. And if one does apply it in this way, one has to remember that the residuals are blessed too.