Compositionality: SQL and Subways

Programmers like to talk about a concept called compositionality, which is the idea that programs and code should consist of small, modular components that fit together cleanly. This property is considered almost a holy grail of software design, but there’s not a lot of clarity about what it means in concrete terms. In this post I want to offer a working definition for compositionality, and offer some examples of sofware—and infrastructure! - that meets the definition. I’ll also look at some structures that fall short, and try to drill down into why they failed. According to my definition, a system of tools is composeable if it meets the following conditions:

Parsimony: there is a small number of tools.
Efficiency: the tools are highly optimized (often this is possible because there is only a small number of them).
Interoperability: the output of one tool can be used as the input to another tool without a lot of additional work. More generally, the tools fit together cleanly.
Practicality: Users can combine the tools together to perform tasks that are useful and significant in the real world.

A system that has these properties creates huge value for its users. The system is easy to learn, since there are just a small number of basic operations. But these primitives can be composed together to create an exponentially large set of complex operations, which will be efficient because they are made up of efficient components. Finally, if the Practicality requirement is met, the large set of possible operations will overlap substantially with thet set of tasks that people actually want to perform.

Discussions of compositionality often start by mentioning the Unix Philosophy of software design, and I’ll follow that precedent. This philosophy emphasizes building small, modular programs, that can be connected together easily. Unix offers an elegant tool for chaining programs together called the Pipe, which causes the output of one program to be used as the input of the next program (in between stages, the data is stored in an OS-managed buffer that the programs don’t need to know about). This technique allows the user to create powerful applications just by typing a few commands together at a prompt. For example, several years ago I had a job at an ad-tech company, where we had huge logs of tab-separated data that represented the billions of ads the company bought for our clients. The business guys would often ask me, “hey Dan, how many ads did we buy for JetBlue and serve on CNN.com?” I would answer this question by running a command like this:

zcat 2014-04-04/*.log.Z | cut -f 4, 18 | grep "cnn.com″ | grep 18433 | wc

This command would unzip all the log files in a directory, extract two columns of interest, and search for the target domain and the client ID, and then count all the resulting lines. At the time, it felt very powerful to be able to run such a significant operation without using any actual code. This power comes from the compositionality of the Unix suite of tools. The core set of commands is relatively small. The commands are written in C, so they start up and run fast. Furthermore, since each program in the chain runs in a separate process, the approach is parallel by default, so the user can easily take advantage of a multi-CPU machine. The pipe mechanism, combined with good design of the utilities, means that the suite is quite interoperable. Finally, the real-world utility of this technique is proved by the fact that many people are able to use it to get real work done.

When I started using this technique, I thought it was awesome, and I dreamed of ways to replace all my bulky Java and Python code with chains of Unix commands. Nowadays, I no longer see the Unix command suite as the pinnacle of compositionality, for a couple of reasons. First, many of the commands don’t offer options that one would naturally expect, and this reduces their combinatorial utility. For example, at one point long ago wanted to get the total disk usage of all the files in a directory that matched a certain pattern. I was sure that I could do this by creating a chain of commands like this:

ls -al | grep "MyPattern" | cut -f 3 | sum

Unfortunately, this doesn’t work because ls doesn’t tab-separate its output, so it can’t be easily fed into the cut command (Unix zealots, please don’t chide me about how this can be done easily if I only knew the right options for ls or some other command. The point is that it wasn’t easy). More subtly, the Unix pipe approach is not as cleanly interoperable as it might initially seem. The primitive commands do not expose the data structure of their inputs and outputs, so there is no easy way to tell if a chain of commands is structurally valid without just running it.

In my view, a better candidate for the Most Composeable prize is SQL, which is probably the most widely used software on the planet. SQL is made up a small number of basic commands with names like SELECT, UPDATE, INSERT, and DELETE. These commands have a simple, common grammar. For example, SELECT, UPDATE, and DELETE commands can all be modified by WHERE clauses, in a way that is extremely intuitive and works in the same way for each command. SELECT statements can be transformed from simple queries into aggregations by the use of statements like GROUP BY, and the resulting object has exactly the same structure. Thus, for example, you can perform an aggregation using a SELECT and a GROUP BY, and then INSERT the result into another table, which can then be queried, updated, or deleted using subsequent SELECT, UPDATE, or DELETE statements. Databases engines can achieve great performance by analyzing a query and planning how to execute it efficiently - this analysis works because SQL uses a small number of core operations. And, of course, SQL is at the core of a huge number of software applications, proving its practical utility.

While I’m still on the subject of software, I want to mention a couple of systems that I do not consider to exemplify compositionality. The first negative example is just a fully general programming language like Python or Java. Some languages are better than others, but most people would agree that languages themselves are not intrinsically composeable. The reason is that, because they are fully general, languages are essentially too complex. The output of one function call is not guaranteed to fit into the input of another call. This might work in particular programs, if the engineer has been careful to build a good API, but typically programming involves a lot of data wrangling. The engineer has to make a REST call, then convert the JSON response into a Java Map, then feed the Map into a 3rd party library, and finally put the result into a database.

I’m a bit reluctant to mention this, since it probably exposes both my age and my dubious taste in software, but early in my career I was excited by a technology called Extensible Stylesheet Language Transformations (XSLT). XSLT was a language for transforming XML documents, and XSLT was itself a subset of XML. This design seemed offer a lot of potential: it was both minimalist (everything is just an XML document) and to offer meta-programming potential (you can transform your XSLT with XSLT). XSLT seemed like it would meet all the criteria for compositionality. There are a small number of data transformation operations that formed the basis of the language. Some smart software wizards would create optimized XSLT engines that were could perform transformations with high performance. The transformations could be chained together easily, since XML is both the input and the output. Finally, it seemed like this could be used for practical applications: I recall using XSLT in web applications to generate HTML pages from source data. While XSLT seemed promising at the start, after working with it for a while, I realized that it failed badly at code-length efficiency. The bulky syntax of XSLT meant that even simple tasks required lots of ugly XML tags and excessive typing (see this example).

Now that I’ve tested out this notion of compositionality with some examples from the world of bits, I want to try to extend it to the world of atoms. One of the best examples of the concept in the physical world is the Tokyo subway:

Tokyo Subway

As you can see, Tokyo has about 15 distinct subway lines that connect dozens of stations. The system provides combinatorial utility: a Tokyo denizen wakes up at his home near Ueno, takes the Hibiya line to Akihabara to buy some electronic gadgets for his son, and then takes the Yamanote line to Tokyo to buy his wife some perfume at a department store. Then he takes the Marunouchi line to Shinjuku to meet his friends for drinks (putting his gifts in a handy subway locker). Finally, at the end of the day, he drunkenly stumbles back onto the Yamanote line and goes home to Ueno. That story represents just one of billions of efficient paths that the system allows. This map doesn’t even actually do justice to the full power of the system, because it only shows subways and not the regular rail lines, which connect to the subway stations and bring travelers in from other cities.

Let’s look at the Tokyo subway network in relation to the criteria for compositionality. First, the system is made up of a relatively small number of lines (far fewer than the number of city streets). A subway train can service its line in a highly optimized way: compared to the automobile, it is much safer and more efficient in terms of total person-miles travelled per unit of energy. The fact that the subway lines connect at shared stations provides the clean input to output property. Finally, the system is plainly of immense practical value.

Now, subway systems aren’t automatically composeable. The Tokyo system has the quality because it is well designed, but not every city is so lucky. A notorious failure of the Bay Area Rapid Transit system is that it does not connect to CalTrain. This means the system’s individual components are not highly interoperable, and therefore prevents the system from achieving combinatorial utility.

Compared to subways, the road system is an illuminating negative example of compositionality. Roads obviously connect to one another: with the exception of some islands, you can get to every road location in North America from any other location. And roads are clearly of great practical value. But the cars-on-roads system fails badly on the efficiency criterion. Cars are actually quite dangerous, and their fuel inefficiency is a major contributor to climate change and other environmental problems. This system also fails the parsimony criterion: there are far too many cars and too many roads as well. The overabundance of roads makes it easy to get lost, and also consumes vast swaths of land area that could otherwise be dedicated to housing, parks, or other purposes.

As a final example, I want to look at an innovation that increased the level of compositionality in a set of pre-existing tools. That innovation is the standardized cargo container, or intermodal container. We’ve all seen thousands of these objects being hauled by many-wheeled trucks on the highway. Some people might not realize that these boxes can be lifted from the truckbed and placed directly onto a cargo ship. Furthermore, it was not until the 70s that the idea of using a standard container became common. Prior to that, when a batch of goods was delivered to a port by a truck, the goods would have to be unloaded from the truck and loaded onto the ship by human laborers.

Container Ship

How does the intermodal container fit into the concept of compositionality? Before the 70s, we had the same kinds of transport vehicles—trucks, trains, and ships. These transport tools were good enough on their own, but they were not cleanly interoperable. Loading and unloading the goods cost time, money, and risked damaging the goods. In some cases, the added cost meant that it was no longer profitable to transport the goods to remote nodes of the network. So the advent of the standardized container increased the interoperability of the system by an enormous degree. This increased the system’s overall compositionality, leading to an increase in combinatorial utility - now there are billions of inexpensive paths a product might travel through the global cargo network.

I think the principles of compositionality I mentioned above provide a good set of design goals for system designers. Create a small number of primitive operations. Make sure that those operations are highly optimized, and that they connect to each other as cleanly as possible. Finally, make sure that when you combine the primitives to create an exponentially large set of complex operations, this set overlaps substantially with the set of tasks that people actually want to perform.