Category Theory Without The Baggage
If you are an algebraic abstractologist, this post is probably not for you. Further meta-commentary can be found in the “meta” section, at the bottom of the post.
So you’ve heard of this thing called “category theory”. Maybe you’ve met some smart people who say that’s it’s really useful and powerful for… something. Maybe you’ve even cracked open a book or watched some lectures, only to find that the entire subject seems to have been generated by training GPT-2 on a mix of algebraic optometry and output from theproofistrivial.com.
What is this subject? What could one do with it, other than write opaque math papers?
This introduction is for you.
This post will cover just the bare-bones foundational pieces: categories, functors, and natural transformations. I will mostly eschew the typical presentation; my goal is just to convey intuition for what these things mean. Depending on interest, I may post a few more pieces in this vein, covering e.g. limits, adjunction, Yoneda lemma, symmetric monoidal categories, types and programming, etc—leave a comment if you want to see more.
Category theory is the study of paths in graphs, so I’ll briefly talk about that and highlight some relevant aspects.
What’s a category? A category is just a graph with some notion of equivalence of paths; we’ll see a few examples.
Pattern matching: find a sub-category with a particular shape. Matches are called “functors”.
One sub-category modelling another: commutative squares and natural transformations.
Paths in Graphs
Here’s a graph:
Here are some paths in that graph:
A → B
B → C
A → B → C
A → A
A → A → A (twice around the loop)
A → A → A → B (twice around the loop, then to B)
(trivial path—start at D and don’t go anywhere)
(trivial path—start at A and don’t go anywhere)
In category theory, we usually care more about the edges and paths than the vertices themselves, so let’s give our edges their own names:
We can then write paths like this:
A → B is written y
B → C is written z
A → B → C is written yz
A → A is written x
A → A → A is written xx
A → A → A → B is written xxy
The trivial path at D is written id_D (this is roughly a standard notation)
The trivial path at A is written id_A
We can build longer paths by “composing” shorter paths. For instance, we can compose y (aka A → B) with z (aka B → C) to form yz (aka A → B → C), or we can compose x with itself to form xx, or we can compose xx with yz to form xxyz. We can compose two paths if-and-only-if the second path starts where the first one begins—we can’t compose x with z because we’d have to magically jump from A to B in the middle.
Composition is asymmetric—composing y with z is fine, but we can’t compose z with y.
Notice that composing id_A with x is just the same as x by itself: if we start at A, don’t go anywhere, and then follow x, then that’s the same as just following x. Similarly, composing x with id_A is just the same as x. Symbolically: id_A x = x id_A = x. Mathematically, id_A is an “identity”—an operation which does nothing; thus the “id” notation.
In applications, graphs almost always have data on them—attached to the vertices, the edges, or both. In category theory in particular, data is usually on the edges. When composing those edges to make paths, we also compose the data.
A simple example: imagine a graph of roads between cities. Each road has a distance. When composing multiple roads into paths, we add together the distances to find the total distance.
Finally, in our original graph, let’s throw in an extra edge from A to itself:
Our graph has become a “multigraph”—a graph with (potentially) more than one distinct edge between each vertex. Now we can’t just write a path as A → A → A anymore—that could refer to xx, xx’, x’x, or x’x’. In category theory, we’ll usually be dealing with multigraphs, so we need to write paths as a sequence of edges rather than the vertices-with-arrows notation. For instance, in our roads-and-cities example, there may be multiple roads between any two cities, so a path needs to specify which roads are taken.
Category theorists call paths and their associated data “morphisms”. This a terrible name, and we mostly won’t use it. Vertices are called “objects”, which is a less terrible name I might occasionally slip into.
What’s a category?
A category is:
a directed multigraph
with some notion of equivalence between paths.
For instance, we could imagine a directed multigraph of flights between airports, with a cost for each flight. A path is then a sequence of flights from one airport to another. As a notion of equivalence, we could declare that two paths are equivalent if they have the same start and end points, and the same total cost.
There is one important rule: our notion of path-equivalence must respect composition. If path p is equivalent to q (which I’ll write ), and , then we must have . In our airports example, this would say: if two flight-paths p and q have the same cost (call it ), and two flight-paths x and y have the same cost (call it ), then the cost of px (i.e. ) must equal the cost of qy (also ).
Besides that, there’s a handful of boilerplate rules:
Any path is equivalent to itself (reflexivity), and if and then (transitivity); these are the usual rules which define equivalence relations.
Any paths with different start and end points must not be equivalent; otherwise expressions like “” might not even be defined.
Let’s look at a few more examples. I’ll try to show some qualitatively different categories, to give some idea of the range available.
Airports & Flights
Our airport example is already a fairly general category, but we could easily add more bells and whistles to it. Rather than having a vertex for each airport, we could have a vertex for each airport at each time. Flights then connect an airport at one time to another airport at another time, and we need some zero-cost “wait” edges to move from an airport at one time to the same airport at a later time. A path would be some combination of flights and waiting. We might expect that the category has some symmetries—e.g. “same flights on different days”—and later we’ll see some tools to formalize those.
As a completely different example, consider the category of divisibility of positive integers:
This category has a path from n to m if-and-only-if n is divisible by m (written m | n, pronounced “m divides n”, i.e. 2 | 12 is read “two divides twelve”). The “data” on the edges is just the divisibility relations—i.e. 6 | 12 or 5 | 15:
We can compose these: 2|6 and 6|12 implies 2|12. A path 12 → 6 → 2 in this category is, in some sense, a proof that 12 is divisible by 2 (given all the divisibility relations on the edges). Note that any two paths from 12 to 2 produce the same result—i.e. 12 → 4 → 2 also gives 2|12. More generally: in this category, any two paths between the same start and end points are equivalent.
Types & Functions
Yet another totally different direction: consider the category of types in some programming language, with functions between those types as edges:
This category has a LOT of stuff in it. There’s a function for addition of two integers, which goes from (int, int) to int. There’s another function for multiplication of two integers, also from (int, int) to int. There are functions operating on lists, strings, and hash tables. There are functions which haven’t been written in the entire history of programming, with input and output types which also haven’t been written.
We know how to compose functions—just call one on the result of the other. We also know when two functions are “equivalent”—they always give the same output when given the same input. So we have a category, using our usual notions of composition and equivalence of functions. This category is the main focus of many CS applications of category theory (e.g. types in Haskell). Mathematicians instead focus on the closely-related category of functions between sets; this is exactly the same except that functions go from one set to another instead of one type to another.
A lot of mathy fields use diagrams like this:
For instance, we can scale an image down () then rotate it () or rotate the image () then scale it (), and get the same result either way. The idea that we get the same result either way is summarized by the phrase “the diagram commutes”; thus the name “commutative diagram”. In terms of paths: we have path-equivalence .
Another way this often shows up: we have some problem which we could solve directly. But it’s easier to transform it into some other form (e.g. change coordinates or change variables), solve in that form, then transform back:
Again, we say “the diagram commutes”. Now our path-equivalence says .
Talking about commutative diagrams is arguably the central purpose of category theory; our main tool for that will be “natural transformations”, which we’ll introduce shortly.
Pattern Matching and Functors
Think about how we use regexes. We write some pattern then try to match it against some string—e.g. “colou*r” matches “color” or “colour” but not “pink”. We can use that to pick out parts of a target string which match the pattern—e.g. we could find the query “color” in the target “every color of the rainbow”.
We’d like to do something similar for categories. Main idea: we want to match objects (a.k.a vertices) in the query category to objects in the target category, and paths in the query category to paths in the target category, in a way that keeps the structure intact.
For example, consider a commutative square:
We’d like to use that as a query on some other category, e.g. our airport category. When we query for a commutative square in our airport category, we’re looking for two paths with the same start and end airports, (potentially) different intermediate airports, but the same overall cost. For instance, maybe Delta has flights from New York to Los Angeles via their hub in Atlanta, and Southwest has flights from New York to Los Angeles via their hub in Vegas, and market competition makes the prices of the two flight-paths equal.
We’ll come back to the commutative square query in the next section. For now, let’s look at some simpler queries, to get a feel for the building blocks of our pattern-matcher. Remember: objects to objects, paths to paths, keep the structure intact.
First, we could use a single-object category with no edges as a query:
This can match against any one object (a.k.a vertex) in the target category. Note that there is a path hiding in the query—the identity path, where we start at the object and just stay there. In general, our pattern-matcher will always match identity paths in the query with identity paths on the corresponding objects in the target category—that’s one part of “keeping the structure intact”.
Next-most complicated is the query with two objects:
This one is slightly subtle—it might match two different objects, or both query objects might match against the same target object. This is just the way pattern-matching works in category theory; there’s no rule to prevent multiple vertices/edges in the query from collapsing into a single vertex/edge in the target category. This is actually useful quite often—for instance, if we have some function which takes in two objects from the target category, then it’s perfectly reasonable to pass in the same object twice. Maybe we have a path-finding algorithm which takes in two airports; it’s perfectly reasonable to expect that algorithm to work even if we pass the same airport twice—that’s a very easy path-finding problem, after all!
Next up, we add in an edge:
Now that we have a nontrivial path, it’s time to highlight a key point: we map paths to paths, not edges to edges. So if our target category contains something like A → B → C, then our one-edge query might match against the A → B edge, or it might match against the B → C edge, or it might match the whole path A → C (via B) - even if there’s no direct edge from A to C. Again, this is useful quite often—if we’re searching for flights from New York to Los Angeles, it’s perfectly fine to show results with a stop or two in the middle. So our one-edge query doesn’t just match each edge; it matches each path between any two objects (including the identity path from an object to itself).
Adding more objects and edges generalizes in the obvious way:
This finds any two paths which start at the same object. As usual, one or both paths could be the identity path, and both paths could be the same.
The other main building block is equivalence between paths. Let’s consider a query with two edges between two objects, with the two edges declared to be equivalent:
You might expect that this finds any two equivalent paths. That’s technically true, but somewhat misleading. As far as category theory is concerned, there’s actually only one path here—we only care about paths up to equivalence (thankyou to Eigil for pointing this out in the comments). So that “one” path will be mapped to “one” path in the target category; our query could actually match any number of paths, as long as they’re all equivalent. Looking back at our one-edge example from earlier, it’s possible that our one edge could be mapped to a whole class of equivalent paths—by mapping it to one path, we’re effectively selecting all the paths equivalent to that one.
A commutative square works more like we’d expect:
In our query, the two paths from upper-left to lower-right are equivalent, but they contain non-equivalent subpaths. So those subpaths may be mapped to non-equivalent paths in the target, as long as those non-equivalent paths compose into equivalent paths. In other words, we’re looking for a commutative square in the target, as we’d expect. (Though we can still find degenerate commutative squares, e.g. matches where the lower left and upper right corner map to the same object.)
Category theorists call each individual match a “functor”. Each different functor—i.e. each match—maps the query category into the target category in a different way.
Note that the target category is itself a category—which means we could use it as a query on some third category. In this case, we can compose matches/functors: if one match tells me how to map category 1 into category 2, and another match tells me how to map category 2 into category 3, then I can combine those to find a map from category 1 into category 3.
Because category theorists love to go meta, we can even define a graph in which the objects are categories and the edges are functors. A path then composes functors, and we say that two paths are equivalent if they result in the same map from the original query category into the final target category. This is called “Cat”, the category of categories and functors. Yay meta.
Meanwhile, back on Earth (or at least low Earth orbit), commutative diagrams.
Exercise: Hopefully you now have an intuitive idea of how our pattern-matcher works, and what information each match (i.e. each functor) contains. Use your intuition to come up with a formal definition of a functor. Then, compare your definition to wikipedia’s definition (jargon note: “morphism” = set of equivalent paths); is your definition equivalent? If not, what’s missing/extraneous in yours, and when would it matter?
Let’s start with a microscopic model of a pot of water. We have some “state”, representing the positions and momenta of every molecule in the water (or quantum field state, if you want to go even lower-level). There are things we can do to the water—boil it, cool it back down, add salt, stir it, wait a few seconds, etc—and each of these things will transform the water from one state to another. We can represent this as a category: the objects are states, the edges are operations moving the water from one state to another (including just letting time pass), and paths represent sequences of operations.
In physics, we usually don’t care how a physical system arrived in a particular state—the state tells us everything we need to know. That would mean that any path between the same start and end states are equivalent in this category (just like in the divisibility category). To make the example a bit more general, let’s assume that we do care about different ways of getting from one state to another—e.g. heating the water, then cooling it, then heating it again will definitely rack up a larger electric/gas bill than just heating it.
Microscopic models accounting for the position and momentum of every molecule are rather difficult to work with, computationally. We might instead prefer a higher-level macroscopic model, e.g. a fluid model where we just track average velocity, temperature, and chemical composition of the fluid in little cells of space and time. We can still model all of our operations—boiling, stirring, etc—but they’ll take a different form. Rather than forces on molecules, now we’re thinking about macroscopic heat flow and total force on each little cell of space at each time.
We can connect these two categories: given a microscopic state we can compute the corresponding macroscopic state. By explicitly including these microscopic → macroscopic transformations as edges, we can incorporate both systems into one category:
Note that multiple micro-states will map to the same macro-state, although I haven’t drawn any.
The key property in this two-part category is path equivalence (a.k.a. commutation). If we start at the leftmost microscopic state, stir (in micro), then transform to the macro representation, then that should be exactly the same as starting at the leftmost microscopic state, transforming to the macro representation, and then stirring (in macro). It should not matter whether we perform some operations in the macro or micro model; the two should “give the same answer”. We represent that idea by saying that two paths are equivalent: one path which transforms micro to macro and then stirs (in macro), and another path which stirs (in micro) and then transforms micro to macro. We have a commutative square.
In fact, we have a bunch of commutative squares. We can pick any path in the micro-model, find the corresponding path in the macro-model, add in the micro->macro transformations, and end up with a commutative square.
Main take-away: prism-shaped categories with commutative squares on their side-faces capture the idea of representing the same system and operations in two different ways, possibly with one representation less granular than the other. We’ll call these kinds of structures “natural transformations”.
Next step: we’d like to use our pattern-matcher to look for natural transformations.
We’ll start with some arbitrary category:
Then we’ll make a copy of it, and add edges from objects in the original to corresponding objects in the copy:
I’ll call the original category “system”, and the copy “model”.
To finish our pattern, we’ll declare path equivalences: if we follow an edge from system to model, then take any path within the model, that’s equivalent to taking the corresponding path within the system, and then following an edge from system to model. We declare those paths equivalent (as well as any equivalences in the original category, and any other equivalences implied, e.g. paths in which our equivalent paths appear as sub-paths).
Now we just take our pattern and plug it into our pattern-matcher, as usual. Our pattern matcher will go looking for a system-model pair, all embedded within whatever target category we’re searching within. Each match is called a natural transformation; we say that the natural transformation maps the system-part to the match of the model-part. Since we call matches “functors”, a category theorist would say that a natural transformation maps one functor (the match of the system-part) to another of the same shape (the match of the model-part).
Now for an important point: remember that, in our pot-of-water example, multiple microscopic states could map to the same macroscopic state. Multiple objects in the source are collapsed into a single object in the target. But our procedure for creating a natural transformation pattern just copies the whole source category directly, without any collapsing. Is our pot-of-water example not a true natural transformation?
It is. Last section I said that it’s sometimes useful for our pattern-matcher to collapse multiple objects into one; the pot-of-water is an example where that matters. Our pattern-matcher may be looking for a copy of the micro model, but it will still match against the macro model, because it’s allowed to collapse multiple objects together into one.
More generally: because our pattern-matcher is allowed to collapse objects together, it’s able to find natural transformations in which the model is less granular than the system.
That concludes the actual content; now I’ll just talk a bit about why I’m writing this.
I’ve bounced off of category theory a couple times before. But smart people kept saying that it’s really powerful, in ways that sound related to my research, so I’ve been taking another pass at the subject over the last few weeks.
Even the best book I’ve found on the material seems burdened mainly by poor formulations of the core concepts and very limited examples. My current impression is that broader adoption of category theory is limited in large part by bad definitions, even when more intuitive equivalent definitions are available—“morphisms” vs “paths” is a particularly blatant example, leading to an entirely unnecessary profusion of identities in definitions. Also, of course, category theorists are constantly trying to go more abstract in ways that make the presentation more confusing without really adding anything in terms of explanation. So I’ve needed to come up with my own concrete examples and look for more intuitive definitions. This write-up is a natural by-product of that process.
I’d especially appreciate feedback on:
whether I’m missing key concepts or made crucial mistakes.
whether this was useful; I may drop some more posts along these lines if many people like it.
whether there’s some wonderful category theory resource which has already done something like this, so I can just read that instead. I would really, really prefer to do this the easy way.