The Golden Age of Data

The more recent something is, the more data we have about it.


This is true in all fields where collecting information over time is important, including astronomy, archaeology, geology, paleontology, the internet, and even our own memories. Why is this so universal? It’s not as if 100-year-old rocks are any more complicated than 300-year-old rocks.


There are three factors that create this effect. The first one is the fact that information becomes less accessible over time. It degrades. In geology, old rocks are worn away by water or covered in deep sand. In astronomy, starlight shifts into undetectable wavelengths as it travels through space. In the brain, memories are reevaluated, modified, and deleted. The same volume of data is generated at all points in time, but older data has had more time to degrade, so there is less of it.


There are some fields where data generation isn’t constant, and then the situation is more complicated. History is the most important example. Historical data is produced by humans, and thus the rate of historic data production follows an exponential curve that roughly correlates with the global population. As the population increases, there are more people to produce records, so more records are produced. This is the second factor.

The third factor is innovation. Humans have invented increasingly efficient methods of storing information throughout history. Papyrus is cheaper than clay tablets, and books are more compact than scrolls. These innovations rarely happen, but when they do, they are revolutionary: Historians today see a spike in information from the mid-15th century not because of the growing dark ages population, but because of the printing press.


The pace of data innovation was very slow before the Information Age, but it has since rocketed forward. Over the past few decades, hard disk drives and SSDs have become more efficient at an astounding rate. We can fit all the English books ever published on a nice hard drive. In fact, now we don’t even need to carry storage media; we can store information in servers thousands of miles away. These servers will also continue to become more efficient as bandwidth and upload speeds increase. The acceleration of the Information Age cannot be overstated: for most of history, humanity only stored information in a handful of ways. Today, a new method is created every month. It is likely that this trend has now overtaken degradation and population growth as the greatest contributor to the exponential curve of historical data. Not only is the population still rising, but now, for the first time in history, the average person produces more data year after year.


This innovation comes with a trade-off. As data storage becomes more compact, data becomes easier to destroy. Words etched in stone persist after exposure to fire and water, whereas paper disintegrates in either case. Hard drives stop working after five years… if they aren’t destroyed by radiation first. All data degrades over time, but modern data degrades faster: It is less resilient. Ironically, while the Information Age produces more data than any other, it also loses more. Modern information accessibility also contributes. The individual has better access to information now than ever before, but this freedom stands upon many fragile layers of technology. If any piece were to stop working—internet protocol, silicon chips, proprietary formats—all access is lost, perhaps permanently.

In the 2020s and 2030s, the trend toward more digital and abstract forms of storage will probably continue. Looking back, we might expect that historians will find a superabundance of information from the 2020s and beyond, but this is not consistent with diminishing data resilience. The majority of our information is already digital. Book publication may be stagnating, and the digital word may someday make the printed word obsolete. Since digital media degrade so quickly, what will be left to historians?


Imagine that physical books become mostly obsolete by, say, the 2070s, and a global cataclysm causes a collapse of civilization in 2080. How can survivors piece together pre-apocalyptic history? By the time they secure subsistence-level farming, most servers have succumbed to lack of climate control, and encryption might make their data inaccessible anyway. Solid state drives and flash drives, which stop working in only a few decades, are mostly useless. The more abstract the information, the less accessible it will be to future historians. Therefore, they might turn to books: those that aren’t lost to fire will last for many decades, creating an abundant record of the past. This record tapers off during the mid-21st century and dries up completely afterwards, because the data created later was all stored in inaccessible electronic forms.


In such a scenario, for future historians, there will be more information available from the late 20th and early 21st centuries than from any other era. These archivists might see the late 21st century as a second dark age, replete with data that is corrupted and inaccessible. The early 21st century, in stark contrast, would be a golden age of data. We are living in this golden age.


This also applies in less drastic scenarios. Today, something as simple as a poorly-placed water leak in the right server room can render inaccessible or even destroy information crucial to a government or organization, to say nothing of a greater natural disaster. Much has been said about the possibility of a coronal mass ejection (CME) completely disabling the modern world, in fact, the US has made plans for defense against them. CMEs have previously been so inconsequential to human affairs that they were hardly noticed until the 19th century. All this, of course, in addition to the more well-known threats like cyberattacks or system-crippling bugs. A complex system has more ways to fail: why should we expect today’s information to persist?


If this risk is real, how should the people of today respond? We could create redundant, low-tech records of the present for future generations. One method is to publish an encyclopedia of the advances and events of each decade and distribute copies throughout the world. Would this type of “data insurance” justify the cost? Just as we are fascinated with the past, future generations will surely be interested in our own era. Should they, too, be participants in the modern data economy?

No comments.