# Category Theory Without The Baggage

If you are an alge­braic ab­strac­tol­o­gist, this post is prob­a­bly not for you. Fur­ther meta-com­men­tary can be found in the “meta” sec­tion, at the bot­tom of the post.

So you’ve heard of this thing called “cat­e­gory the­ory”. Maybe you’ve met some smart peo­ple who say that’s it’s re­ally use­ful and pow­er­ful for… some­thing. Maybe you’ve even cracked open a book or watched some lec­tures, only to find that the en­tire sub­ject seems to have been gen­er­ated by train­ing GPT-2 on a mix of alge­braic op­tom­e­try and out­put from the­p­roofistriv­ial.com.

What is this sub­ject? What could one do with it, other than write opaque math pa­pers?

This in­tro­duc­tion is for you.

This post will cover just the bare-bones foun­da­tional pieces: cat­e­gories, func­tors, and nat­u­ral trans­for­ma­tions. I will mostly es­chew the typ­i­cal pre­sen­ta­tion; my goal is just to con­vey in­tu­ition for what these things mean. Depend­ing on in­ter­est, I may post a few more pieces in this vein, cov­er­ing e.g. limits, ad­junc­tion, Yoneda lemma, sym­met­ric monoidal cat­e­gories, types and pro­gram­ming, etc—leave a com­ment if you want to see more.

Out­line:

• Cat­e­gory the­ory is the study of paths in graphs, so I’ll briefly talk about that and high­light some rele­vant as­pects.

• What’s a cat­e­gory? A cat­e­gory is just a graph with some no­tion of equiv­alence of paths; we’ll see a few ex­am­ples.

• Pat­tern match­ing: find a sub-cat­e­gory with a par­tic­u­lar shape. Matches are called “func­tors”.

• One sub-cat­e­gory mod­el­ling an­other: com­mu­ta­tive squares and nat­u­ral trans­for­ma­tions.

## Paths in Graphs

Here’s a graph:

Here are some paths in that graph:

• A → B

• B → C

• A → B → C

• A → A

• A → A → A (twice around the loop)

• A → A → A → B (twice around the loop, then to B)

• (triv­ial path—start at D and don’t go any­where)

• (triv­ial path—start at A and don’t go any­where)

In cat­e­gory the­ory, we usu­ally care more about the edges and paths than the ver­tices them­selves, so let’s give our edges their own names:

We can then write paths like this:

• A → B is writ­ten y

• B → C is writ­ten z

• A → B → C is writ­ten yz

• A → A is writ­ten x

• A → A → A is writ­ten xx

• A → A → A → B is writ­ten xxy

• The triv­ial path at D is writ­ten id_D (this is roughly a stan­dard no­ta­tion)

• The triv­ial path at A is writ­ten id_A

We can build longer paths by “com­pos­ing” shorter paths. For in­stance, we can com­pose y (aka A → B) with z (aka B → C) to form yz (aka A → B → C), or we can com­pose x with it­self to form xx, or we can com­pose xx with yz to form xxyz. We can com­pose two paths if-and-only-if the sec­ond path starts where the first one be­gins—we can’t com­pose x with z be­cause we’d have to mag­i­cally jump from A to B in the mid­dle.

Com­po­si­tion is asym­met­ric—com­pos­ing y with z is fine, but we can’t com­pose z with y.

No­tice that com­pos­ing id_A with x is just the same as x by it­self: if we start at A, don’t go any­where, and then fol­low x, then that’s the same as just fol­low­ing x. Similarly, com­pos­ing x with id_A is just the same as x. Sym­bol­i­cally: id_A x = x id_A = x. Math­e­mat­i­cally, id_A is an “iden­tity”—an op­er­a­tion which does noth­ing; thus the “id” no­ta­tion.

In ap­pli­ca­tions, graphs al­most always have data on them—at­tached to the ver­tices, the edges, or both. In cat­e­gory the­ory in par­tic­u­lar, data is usu­ally on the edges. When com­pos­ing those edges to make paths, we also com­pose the data.

A sim­ple ex­am­ple: imag­ine a graph of roads be­tween cities. Each road has a dis­tance. When com­pos­ing mul­ti­ple roads into paths, we add to­gether the dis­tances to find the to­tal dis­tance.

Fi­nally, in our origi­nal graph, let’s throw in an ex­tra edge from A to it­self:

Our graph has be­come a “multi­graph”—a graph with (po­ten­tially) more than one dis­tinct edge be­tween each ver­tex. Now we can’t just write a path as A → A → A any­more—that could re­fer to xx, xx’, x’x, or x’x’. In cat­e­gory the­ory, we’ll usu­ally be deal­ing with multi­graphs, so we need to write paths as a se­quence of edges rather than the ver­tices-with-ar­rows no­ta­tion. For in­stance, in our roads-and-cities ex­am­ple, there may be mul­ti­ple roads be­tween any two cities, so a path needs to spec­ify which roads are taken.

Cat­e­gory the­o­rists call paths and their as­so­ci­ated data “mor­phisms”. This a ter­rible name, and we mostly won’t use it. Ver­tices are called “ob­jects”, which is a less ter­rible name I might oc­ca­sion­ally slip into.

## What’s a cat­e­gory?

A cat­e­gory is:

• a di­rected multigraph

• with some no­tion of equiv­alence be­tween paths.

For in­stance, we could imag­ine a di­rected multi­graph of flights be­tween air­ports, with a cost for each flight. A path is then a se­quence of flights from one air­port to an­other. As a no­tion of equiv­alence, we could de­clare that two paths are equiv­a­lent if they have the same start and end points, and the same to­tal cost.

There is one im­por­tant rule: our no­tion of path-equiv­alence must re­spect com­po­si­tion. If path p is equiv­a­lent to q (which I’ll write ), and , then we must have . In our air­ports ex­am­ple, this would say: if two flight-paths p and q have the same cost (call it ), and two flight-paths x and y have the same cost (call it ), then the cost of px (i.e. ) must equal the cost of qy (also ).

Be­sides that, there’s a hand­ful of boiler­plate rules:

• Any path is equiv­a­lent to it­self (re­flex­ivity), and if and then (tran­si­tivity); these are the usual rules which define equiv­alence re­la­tions.

• Any paths with differ­ent start and end points must not be equiv­a­lent; oth­er­wise ex­pres­sions like “” might not even be defined.

Let’s look at a few more ex­am­ples. I’ll try to show some qual­i­ta­tively differ­ent cat­e­gories, to give some idea of the range available.

Air­ports & Flights

Our air­port ex­am­ple is already a fairly gen­eral cat­e­gory, but we could eas­ily add more bells and whis­tles to it. Rather than hav­ing a ver­tex for each air­port, we could have a ver­tex for each air­port at each time. Flights then con­nect an air­port at one time to an­other air­port at an­other time, and we need some zero-cost “wait” edges to move from an air­port at one time to the same air­port at a later time. A path would be some com­bi­na­tion of flights and wait­ing. We might ex­pect that the cat­e­gory has some sym­me­tries—e.g. “same flights on differ­ent days”—and later we’ll see some tools to for­mal­ize those.

Divisibility

As a com­pletely differ­ent ex­am­ple, con­sider the cat­e­gory of di­visi­bil­ity of pos­i­tive in­te­gers:

This cat­e­gory has a path from n to m if-and-only-if n is di­visi­ble by m (writ­ten m | n, pro­nounced “m di­vides n”, i.e. 2 | 12 is read “two di­vides twelve”). The “data” on the edges is just the di­visi­bil­ity re­la­tions—i.e. 6 | 12 or 5 | 15:

We can com­pose these: 2|6 and 6|12 im­plies 2|12. A path 12 → 6 → 2 in this cat­e­gory is, in some sense, a proof that 12 is di­visi­ble by 2 (given all the di­visi­bil­ity re­la­tions on the edges). Note that any two paths from 12 to 2 pro­duce the same re­sult—i.e. 12 → 4 → 2 also gives 2|12. More gen­er­ally: in this cat­e­gory, any two paths be­tween the same start and end points are equiv­a­lent.

Types & Functions

Yet an­other to­tally differ­ent di­rec­tion: con­sider the cat­e­gory of types in some pro­gram­ming lan­guage, with func­tions be­tween those types as edges:

This cat­e­gory has a LOT of stuff in it. There’s a func­tion for ad­di­tion of two in­te­gers, which goes from (int, int) to int. There’s an­other func­tion for mul­ti­pli­ca­tion of two in­te­gers, also from (int, int) to int. There are func­tions op­er­at­ing on lists, strings, and hash ta­bles. There are func­tions which haven’t been writ­ten in the en­tire his­tory of pro­gram­ming, with in­put and out­put types which also haven’t been writ­ten.

We know how to com­pose func­tions—just call one on the re­sult of the other. We also know when two func­tions are “equiv­a­lent”—they always give the same out­put when given the same in­put. So we have a cat­e­gory, us­ing our usual no­tions of com­po­si­tion and equiv­alence of func­tions. This cat­e­gory is the main fo­cus of many CS ap­pli­ca­tions of cat­e­gory the­ory (e.g. types in Haskell). Math­e­mat­i­ci­ans in­stead fo­cus on the closely-re­lated cat­e­gory of func­tions be­tween sets; this is ex­actly the same ex­cept that func­tions go from one set to an­other in­stead of one type to an­other.

Com­mu­ta­tive Diagrams

A lot of mathy fields use di­a­grams like this:

For in­stance, we can scale an image down () then ro­tate it () or ro­tate the image () then scale it (), and get the same re­sult ei­ther way. The idea that we get the same re­sult ei­ther way is sum­ma­rized by the phrase “the di­a­gram com­mutes”; thus the name “com­mu­ta­tive di­a­gram”. In terms of paths: we have path-equiv­alence .

Another way this of­ten shows up: we have some prob­lem which we could solve di­rectly. But it’s eas­ier to trans­form it into some other form (e.g. change co­or­di­nates or change vari­ables), solve in that form, then trans­form back:

Again, we say “the di­a­gram com­mutes”. Now our path-equiv­alence says .

Talk­ing about com­mu­ta­tive di­a­grams is ar­guably the cen­tral pur­pose of cat­e­gory the­ory; our main tool for that will be “nat­u­ral trans­for­ma­tions”, which we’ll in­tro­duce shortly.

## Pat­tern Match­ing and Functors

Think about how we use regexes. We write some pat­tern then try to match it against some string—e.g. “colou*r” matches “color” or “colour” but not “pink”. We can use that to pick out parts of a tar­get string which match the pat­tern—e.g. we could find the query “color” in the tar­get “ev­ery color of the rain­bow”.

We’d like to do some­thing similar for cat­e­gories. Main idea: we want to match ob­jects (a.k.a ver­tices) in the query cat­e­gory to ob­jects in the tar­get cat­e­gory, and paths in the query cat­e­gory to paths in the tar­get cat­e­gory, in a way that keeps the struc­ture in­tact.

For ex­am­ple, con­sider a com­mu­ta­tive square:

We’d like to use that as a query on some other cat­e­gory, e.g. our air­port cat­e­gory. When we query for a com­mu­ta­tive square in our air­port cat­e­gory, we’re look­ing for two paths with the same start and end air­ports, (po­ten­tially) differ­ent in­ter­me­di­ate air­ports, but the same over­all cost. For in­stance, maybe Delta has flights from New York to Los An­ge­les via their hub in At­lanta, and South­west has flights from New York to Los An­ge­les via their hub in Ve­gas, and mar­ket com­pe­ti­tion makes the prices of the two flight-paths equal.

We’ll come back to the com­mu­ta­tive square query in the next sec­tion. For now, let’s look at some sim­pler queries, to get a feel for the build­ing blocks of our pat­tern-matcher. Re­mem­ber: ob­jects to ob­jects, paths to paths, keep the struc­ture in­tact.

First, we could use a sin­gle-ob­ject cat­e­gory with no edges as a query:

This can match against any one ob­ject (a.k.a ver­tex) in the tar­get cat­e­gory. Note that there is a path hid­ing in the query—the iden­tity path, where we start at the ob­ject and just stay there. In gen­eral, our pat­tern-matcher will always match iden­tity paths in the query with iden­tity paths on the cor­re­spond­ing ob­jects in the tar­get cat­e­gory—that’s one part of “keep­ing the struc­ture in­tact”.

Next-most com­pli­cated is the query with two ob­jects:

This one is slightly sub­tle—it might match two differ­ent ob­jects, or both query ob­jects might match against the same tar­get ob­ject. This is just the way pat­tern-match­ing works in cat­e­gory the­ory; there’s no rule to pre­vent mul­ti­ple ver­tices/​edges in the query from col­laps­ing into a sin­gle ver­tex/​edge in the tar­get cat­e­gory. This is ac­tu­ally use­ful quite of­ten—for in­stance, if we have some func­tion which takes in two ob­jects from the tar­get cat­e­gory, then it’s perfectly rea­son­able to pass in the same ob­ject twice. Maybe we have a path-find­ing al­gorithm which takes in two air­ports; it’s perfectly rea­son­able to ex­pect that al­gorithm to work even if we pass the same air­port twice—that’s a very easy path-find­ing prob­lem, af­ter all!

Next up, we add in an edge:

Now that we have a non­triv­ial path, it’s time to high­light a key point: we map paths to paths, not edges to edges. So if our tar­get cat­e­gory con­tains some­thing like A → B → C, then our one-edge query might match against the A → B edge, or it might match against the B → C edge, or it might match the whole path A → C (via B) - even if there’s no di­rect edge from A to C. Again, this is use­ful quite of­ten—if we’re search­ing for flights from New York to Los An­ge­les, it’s perfectly fine to show re­sults with a stop or two in the mid­dle. So our one-edge query doesn’t just match each edge; it matches each path be­tween any two ob­jects (in­clud­ing the iden­tity path from an ob­ject to it­self).

Ad­ding more ob­jects and edges gen­er­al­izes in the ob­vi­ous way:

This finds any two paths which start at the same ob­ject. As usual, one or both paths could be the iden­tity path, and both paths could be the same.

The other main build­ing block is equiv­alence be­tween paths. Let’s con­sider a query with two edges be­tween two ob­jects, with the two edges de­clared to be equiv­a­lent:

You might ex­pect that this finds any two equiv­a­lent paths. That’s tech­ni­cally true, but some­what mis­lead­ing. As far as cat­e­gory the­ory is con­cerned, there’s ac­tu­ally only one path here—we only care about paths up to equiv­alence (thankyou to Eigil for point­ing this out in the com­ments). So that “one” path will be mapped to “one” path in the tar­get cat­e­gory; our query could ac­tu­ally match any num­ber of paths, as long as they’re all equiv­a­lent. Look­ing back at our one-edge ex­am­ple from ear­lier, it’s pos­si­ble that our one edge could be mapped to a whole class of equiv­a­lent paths—by map­ping it to one path, we’re effec­tively se­lect­ing all the paths equiv­a­lent to that one.

A com­mu­ta­tive square works more like we’d ex­pect:

In our query, the two paths from up­per-left to lower-right are equiv­a­lent, but they con­tain non-equiv­a­lent sub­paths. So those sub­paths may be mapped to non-equiv­a­lent paths in the tar­get, as long as those non-equiv­a­lent paths com­pose into equiv­a­lent paths. In other words, we’re look­ing for a com­mu­ta­tive square in the tar­get, as we’d ex­pect. (Though we can still find de­gen­er­ate com­mu­ta­tive squares, e.g. matches where the lower left and up­per right cor­ner map to the same ob­ject.)

Cat­e­gory the­o­rists call each in­di­vi­d­ual match a “func­tor”. Each differ­ent func­tor—i.e. each match—maps the query cat­e­gory into the tar­get cat­e­gory in a differ­ent way.

Note that the tar­get cat­e­gory is it­self a cat­e­gory—which means we could use it as a query on some third cat­e­gory. In this case, we can com­pose matches/​func­tors: if one match tells me how to map cat­e­gory 1 into cat­e­gory 2, and an­other match tells me how to map cat­e­gory 2 into cat­e­gory 3, then I can com­bine those to find a map from cat­e­gory 1 into cat­e­gory 3.

Be­cause cat­e­gory the­o­rists love to go meta, we can even define a graph in which the ob­jects are cat­e­gories and the edges are func­tors. A path then com­poses func­tors, and we say that two paths are equiv­a­lent if they re­sult in the same map from the origi­nal query cat­e­gory into the fi­nal tar­get cat­e­gory. This is called “Cat”, the cat­e­gory of cat­e­gories and func­tors. Yay meta.

Mean­while, back on Earth (or at least low Earth or­bit), com­mu­ta­tive di­a­grams.

Ex­er­cise: Hope­fully you now have an in­tu­itive idea of how our pat­tern-matcher works, and what in­for­ma­tion each match (i.e. each func­tor) con­tains. Use your in­tu­ition to come up with a for­mal defi­ni­tion of a func­tor. Then, com­pare your defi­ni­tion to wikipe­dia’s defi­ni­tion (jar­gon note: “mor­phism” = set of equiv­a­lent paths); is your defi­ni­tion equiv­a­lent? If not, what’s miss­ing/​ex­tra­ne­ous in yours, and when would it mat­ter?

## Nat­u­ral Transformations

Let’s start with a micro­scopic model of a pot of wa­ter. We have some “state”, rep­re­sent­ing the po­si­tions and mo­menta of ev­ery molecule in the wa­ter (or quan­tum field state, if you want to go even lower-level). There are things we can do to the wa­ter—boil it, cool it back down, add salt, stir it, wait a few sec­onds, etc—and each of these things will trans­form the wa­ter from one state to an­other. We can rep­re­sent this as a cat­e­gory: the ob­jects are states, the edges are op­er­a­tions mov­ing the wa­ter from one state to an­other (in­clud­ing just let­ting time pass), and paths rep­re­sent se­quences of op­er­a­tions.

In physics, we usu­ally don’t care how a phys­i­cal sys­tem ar­rived in a par­tic­u­lar state—the state tells us ev­ery­thing we need to know. That would mean that any path be­tween the same start and end states are equiv­a­lent in this cat­e­gory (just like in the di­visi­bil­ity cat­e­gory). To make the ex­am­ple a bit more gen­eral, let’s as­sume that we do care about differ­ent ways of get­ting from one state to an­other—e.g. heat­ing the wa­ter, then cool­ing it, then heat­ing it again will definitely rack up a larger elec­tric/​gas bill than just heat­ing it.

Micro­scopic mod­els ac­count­ing for the po­si­tion and mo­men­tum of ev­ery molecule are rather difficult to work with, com­pu­ta­tion­ally. We might in­stead pre­fer a higher-level macro­scopic model, e.g. a fluid model where we just track av­er­age ve­loc­ity, tem­per­a­ture, and chem­i­cal com­po­si­tion of the fluid in lit­tle cells of space and time. We can still model all of our op­er­a­tions—boiling, stir­ring, etc—but they’ll take a differ­ent form. Rather than forces on molecules, now we’re think­ing about macro­scopic heat flow and to­tal force on each lit­tle cell of space at each time.

We can con­nect these two cat­e­gories: given a micro­scopic state we can com­pute the cor­re­spond­ing macro­scopic state. By ex­plic­itly in­clud­ing these micro­scopic → macro­scopic trans­for­ma­tions as edges, we can in­cor­po­rate both sys­tems into one cat­e­gory:

Note that mul­ti­ple micro-states will map to the same macro-state, al­though I haven’t drawn any.

The key prop­erty in this two-part cat­e­gory is path equiv­alence (a.k.a. com­mu­ta­tion). If we start at the left­most micro­scopic state, stir (in micro), then trans­form to the macro rep­re­sen­ta­tion, then that should be ex­actly the same as start­ing at the left­most micro­scopic state, trans­form­ing to the macro rep­re­sen­ta­tion, and then stir­ring (in macro). It should not mat­ter whether we perform some op­er­a­tions in the macro or micro model; the two should “give the same an­swer”. We rep­re­sent that idea by say­ing that two paths are equiv­a­lent: one path which trans­forms micro to macro and then stirs (in macro), and an­other path which stirs (in micro) and then trans­forms micro to macro. We have a com­mu­ta­tive square.

In fact, we have a bunch of com­mu­ta­tive squares. We can pick any path in the micro-model, find the cor­re­spond­ing path in the macro-model, add in the micro->macro trans­for­ma­tions, and end up with a com­mu­ta­tive square.

Main take-away: prism-shaped cat­e­gories with com­mu­ta­tive squares on their side-faces cap­ture the idea of rep­re­sent­ing the same sys­tem and op­er­a­tions in two differ­ent ways, pos­si­bly with one rep­re­sen­ta­tion less gran­u­lar than the other. We’ll call these kinds of struc­tures “nat­u­ral trans­for­ma­tions”.

Next step: we’d like to use our pat­tern-matcher to look for nat­u­ral trans­for­ma­tions.

Then we’ll make a copy of it, and add edges from ob­jects in the origi­nal to cor­re­spond­ing ob­jects in the copy:

I’ll call the origi­nal cat­e­gory “sys­tem”, and the copy “model”.

To finish our pat­tern, we’ll de­clare path equiv­alences: if we fol­low an edge from sys­tem to model, then take any path within the model, that’s equiv­a­lent to tak­ing the cor­re­spond­ing path within the sys­tem, and then fol­low­ing an edge from sys­tem to model. We de­clare those paths equiv­a­lent (as well as any equiv­alences in the origi­nal cat­e­gory, and any other equiv­alences im­plied, e.g. paths in which our equiv­a­lent paths ap­pear as sub-paths).

Now we just take our pat­tern and plug it into our pat­tern-matcher, as usual. Our pat­tern matcher will go look­ing for a sys­tem-model pair, all em­bed­ded within what­ever tar­get cat­e­gory we’re search­ing within. Each match is called a nat­u­ral trans­for­ma­tion; we say that the nat­u­ral trans­for­ma­tion maps the sys­tem-part to the match of the model-part. Since we call matches “func­tors”, a cat­e­gory the­o­rist would say that a nat­u­ral trans­for­ma­tion maps one func­tor (the match of the sys­tem-part) to an­other of the same shape (the match of the model-part).

Now for an im­por­tant point: re­mem­ber that, in our pot-of-wa­ter ex­am­ple, mul­ti­ple micro­scopic states could map to the same macro­scopic state. Mul­ti­ple ob­jects in the source are col­lapsed into a sin­gle ob­ject in the tar­get. But our pro­ce­dure for cre­at­ing a nat­u­ral trans­for­ma­tion pat­tern just copies the whole source cat­e­gory di­rectly, with­out any col­laps­ing. Is our pot-of-wa­ter ex­am­ple not a true nat­u­ral trans­for­ma­tion?

It is. Last sec­tion I said that it’s some­times use­ful for our pat­tern-matcher to col­lapse mul­ti­ple ob­jects into one; the pot-of-wa­ter is an ex­am­ple where that mat­ters. Our pat­tern-matcher may be look­ing for a copy of the micro model, but it will still match against the macro model, be­cause it’s al­lowed to col­lapse mul­ti­ple ob­jects to­gether into one.

More gen­er­ally: be­cause our pat­tern-matcher is al­lowed to col­lapse ob­jects to­gether, it’s able to find nat­u­ral trans­for­ma­tions in which the model is less gran­u­lar than the sys­tem.

## Meta

That con­cludes the ac­tual con­tent; now I’ll just talk a bit about why I’m writ­ing this.

I’ve bounced off of cat­e­gory the­ory a cou­ple times be­fore. But smart peo­ple kept say­ing that it’s re­ally pow­er­ful, in ways that sound re­lated to my re­search, so I’ve been tak­ing an­other pass at the sub­ject over the last few weeks.

Even the best book I’ve found on the ma­te­rial seems bur­dened mainly by poor for­mu­la­tions of the core con­cepts and very limited ex­am­ples. My cur­rent im­pres­sion is that broader adop­tion of cat­e­gory the­ory is limited in large part by bad defi­ni­tions, even when more in­tu­itive equiv­a­lent defi­ni­tions are available—“mor­phisms” vs “paths” is a par­tic­u­larly blatant ex­am­ple, lead­ing to an en­tirely un­nec­es­sary profu­sion of iden­tities in defi­ni­tions. Also, of course, cat­e­gory the­o­rists are con­stantly try­ing to go more ab­stract in ways that make the pre­sen­ta­tion more con­fus­ing with­out re­ally adding any­thing in terms of ex­pla­na­tion. So I’ve needed to come up with my own con­crete ex­am­ples and look for more in­tu­itive defi­ni­tions. This write-up is a nat­u­ral by-product of that pro­cess.

I’d es­pe­cially ap­pre­ci­ate feed­back on:

• whether I’m miss­ing key con­cepts or made cru­cial mis­takes.

• whether this was use­ful; I may drop some more posts along these lines if many peo­ple like it.

• whether there’s some won­der­ful cat­e­gory the­ory re­source which has already done some­thing like this, so I can just read that in­stead. I would re­ally, re­ally pre­fer to do this the easy way.

• As an alge­braic ab­strac­tol­o­gist, let me just say this is an ab­solutely great post. My com­ments:

Cat­e­gory the­o­rists don’t dis­t­in­guish be­tween a cat­e­gory with two ob­jects and an edge be­tween them, and a cat­e­gory with two ob­jects and two iden­ti­fied edges be­tween them (the lat­ter ob­ject doesn’t re­ally even make sense in the usual ac­count). In gen­eral, the ex­tra equiv­alence re­la­tion that you have to carry around makes cer­tain things more com­pli­cated in this ver­sion.

I do tend to agree with you that think­ing of cat­e­gories as ob­jects, edges and an equiv­alence re­la­tion on paths is a more in­tu­itive per­spec­tive, but let me defend the tra­di­tional pre­sen­ta­tion. By far the most es­sen­tial/​pro­to­typ­i­cal ex­am­ples are the cat­e­gories of sets and func­tions, or types and func­tions. Here, it’s more nat­u­ral to speak of func­tions from x to y, than to speak of “com­pos­able se­quences of func­tions be­gin­ning at x and end­ing at y, up to the equiv­alence re­la­tion which iden­ti­fies two se­quences if they have the same com­pos­ite”.

Again, I ab­solutely love this post. I am frankly a bit shocked that no­body seems to have writ­ten an in­tro­duc­tion us­ing this lan­guage—I think ev­ery­one is too en­am­ored with sets as an ex­am­ple.

• Thanks for point­ing out the iden­ti­fied edges thing, I hadn’t no­ticed it be­fore. I’ll up­date the ex­am­ples once I’ve up­dated my in­tu­ition.

Also I’m glad you like it! :)

UPDATE: fixed it.

• Re­lat­edly, I’m go­ing back and forth in my head a bit about whether it’s bet­ter to ex­plain cat­e­gory the­ory in graph the­ory terms by iden­ti­fy­ing the mor­phisms with edges or with paths.

Mor­phisms = Edges

• In this ver­sion, a sub­set of mul­ti­di­graphs (ap­par­ently also called quiv­ers!), can be thought of as cat­e­gories—those for which ev­ery ver­tex has an edge to it­self, and for which when­ever there’s a path from A to B, there’s also an edge di­rectly from A to B.

• You also have to say:

• for each pair of edges from A to B and B to C, which edge from A to C cor­re­sponds to their composition

• for each node, which (of pos­si­bly mul­ti­ple) edge to it­self is its de­fault or iden­tity edge

• in such a way that the as­so­ci­a­tive and uni­tal laws hold

Mor­phisms = Paths

• In this ver­sion, any mul­ti­di­graph (quiver) can be thought of as a cat­e­gory.

• You get the iden­tities for free, be­cause they’re just the triv­ial, do-noth­ing, paths.

• You get com­po­si­tion for free, be­cause we already know what it means that fol­low­ing a path from A to M and then from M to Z is it­self a path.

• And you get the as­so­ci­a­tive and uni­tal laws for (al­most) free:

• uni­tal: do­ing noth­ing at the start or end of a path ob­vi­ously doesn’t change the path

• as­so­ci­a­tive: it’s nat­u­ral to think of the paths ((e1, e2), e3) and (e1, (e2, e3)) -- where e1, e2, and e3 are edges—as both be­ing the same path [e1, e2, e3]

• How­ever, you now have to add on one weird ex­tra rule that two paths that start and end at the same place can be con­sid­ered the same path, even if they didn’t go through the same in­ter­me­di­ate nodes.

• In other words, the in­tu­ition that com­pos­ing-pairs-of-paths-in-differ­ent-or­ders-always-gives-you-the-same-fi­nal-path gives you a suffi­cient, but not nec­es­sary con­di­tion for two paths be­ing con­sid­ered equiv­a­lent.

I think this fi­nal point in the mor­phisms = paths for­mu­la­tion might be what tripped you up in the case Eigil points out above, where cat­e­gory the­ory treats two ar­rows from A to B that are equiv­a­lent to each other as ac­tu­ally the same ar­row. This seems to be the one place (from what I can see so far) where the paths for­mu­la­tion gives the wrong in­tu­ition.

• The main rea­son I like the paths for­mu­la­tion is that I can look around at the real world and im­me­di­ately pick out sys­tems which might make sense to model as cat­e­gories. “Paths in graphs” is some­thing I can rec­og­nize at a glance. Think­ing about links on the in­ter­net? The mor­phisms are paths of links. Friend con­nec­tions on face­book? The mor­phisms are friend-of-friend paths. Plane flights? The mor­phisms are travel plans. Etc.

I ex­pect that the big ap­pli­ca­tions of cat­e­gory the­ory will even­tu­ally come from some­thing be­sides sets and func­tions, and I want to be able to rec­og­nize such ap­pli­ca­tions at a glance when op­por­tu­nity comes knock­ing.

The cost is need­ing to build some in­tu­ition for path equiv­alence. I’m still build­ing that in­tu­ition, and that is in­deed what tripped me up. It will come with prac­tice.

• Separate com­ment for refer­ences to clas­si­cal cat­e­gory the­ory books and re­sources. I don’t think any of these are ex­actly what you are look­ing for, but they each pro­pose a differ­ent per­spec­tive on these con­cepts, which might be the best we have now.

• The best text­book I know of is Cat­e­gory The­ory by Awodey. It is both rigor­ous and in­tu­itive, at least at the level of maths text­book. There are a lot of ex­am­ples, as con­crete as pos­si­ble, and the differ­ences be­tween them and how that in­form the ab­stract defi­ni­tions are treated in de­tails.

• Do not go to Ma­clane’s Cat­e­gory The­ory for Work­ing Math­e­mat­i­cian. Not that it is a bad book, just that it is the most hon­est ti­tle of a book I ever saw. Ma­clane writes for the work­ing math­e­mat­i­cian, so not even the grad­u­ate stu­dent in maths fits ex­actly his stan­dards.

• For a glimpse of the struc­tur­ing power of cat­e­gory the­ory and its links to physics and com­puter sci­ence, Physics, Topol­ogy, Logic and Com­pu­ta­tion: A Rosetta Stone by Baez and Stay is the place to go. This pa­per also ar­gues elo­quently that the most im­por­tant cat­e­gories are not the one similar to the cat­e­gory of sets.

• A short one that I like is Ba­sic Cat­e­gory The­ory for Com­puter Scien­tists by Pierce. It is short, to the point, and goes deeper into the ap­pli­ca­tions to the­o­ret­i­cal com­puter sci­ence. One caveat is that Pierce is the kind of com­puter sci­en­tist that stud­ies proof the­ory and teaches Coq and the­o­rem prov­ing. So it might be slightly too ab­stract for some peo­ple.

• I just found an­other in­ter­est­ing refer­ence: Cat­e­gories for the prac­tis­ing physi­cist. Although this is not ex­actly about dis­card­ing un­due ab­strac­tion, it does pre­sent many con­cepts in terms of con­crete ex­am­ples, and there are even real-world cat­e­gories defined in it!

• Caveat: I’m a bit of an alge­braic ab­strac­tol­o­gist with a not-that-deep un­der­stand­ing of cat­e­gory the­ory. Thus I might rep­re­sent the worst of both wor­lds, some­one defend­ing the ab­stract for the ab­stract with­out con­crete ar­gu­ments.

My first re­ac­tion when see­ing this post is that it was a great idea to give an in­tu­itive ex­pla­na­tion of cat­e­gory the­ory. My sec­ond re­ac­tion, when read­ing the in­tro­duc­tion and the “Path in Graphs” sec­tion, was to feel like ev­ery use­ful part of cat­e­gory the­ory had been thrown away. After read­ing the whole post, I feel there are great parts (no­tably the ex­tended con­crete ex­am­ple for func­tors), and oth­ers with which I dis­agree. I will try to clar­ify my dis­agree­ment.

• First, cat­e­gory the­ory is not about graphs. Cat­e­gories are not graphs. There not even graphs with a tiny bit of struc­ture on top. Cat­e­gories are ab­strac­tions of math­e­mat­i­cals struc­tures, which can be rep­re­sented as graph. One can even ar­gue that ap­pli­ca­tions of cat­e­gory the­ory are sim­ply about struc­tur­ing the un­der­ly­ing ob­jects and re­la­tions in such a way that what we want to prove is shown by a ba­sic di­a­gram. An in­tu­ition for this is that you can do cat­e­gory the­ory with­out us­ing the graphs.

• There are very good rea­son to use sets and the cat­e­gory of sets as pri­mar­ily ex­am­ples. The most ob­vi­ous one is that cat­e­gory the­ory aims to provide a fon­da­tion of math­e­math­ics in place of set the­ory. Thus start­ing with sets, which any stu­dent of mod­ern math­e­mat­ics know, is be­ing nice to the math­e­mat­i­cal reader. Also, pretty much any book on cat­e­gory the­ory uses the cat­e­gory of sets as a foil, to show that cat­e­gor­i­cal no­tions should not be given an in­tu­ition based only on set the­ory and func­tions on sets, as this in­tu­ition breaks down in other cases. For ex­am­ple, the fact that an iso­mor­phism in cat­e­gory the­ory is not just a bi­jec­tion as in the cat­e­gory of sets—for the cat­e­gory of posets, iso­mor­phisms must be or­der-pre­serv­ing while bi­jec­tions are not nec­es­sar­ily.

• Lastly, about the ap­pli­ca­tions you seem to search for cat­e­gory the­ory. I feel like you want to use cat­e­gory the­ory as a tool to solve a prob­lem, like you ap­ply an in­equal­ity to de­rive a new the­o­rem. But as far as I know, all uses of cat­e­gory the­ory come af­ter the fact. It is a for­mal and struc­tured per­spec­tive you can take on math­e­mat­i­cal ob­jects and how they re­late. It might give you an un­der­stand­ing of the un­der­ly­ing struc­ture be­hind your ap­proach or re­sults, but it rarely, if ever, gives you the means to ac­com­plish an effec­tive goal. Even the most con­crete ex­am­ples like mon­ads in Haskell come from peo­ple already com­mit­ted to find a solu­tion within cat­e­gory the­ory, not be­cause cat­e­gory the­ory was the best tool to solve the prob­lem at hand.

All that be­ing said, I still think try­ing to boil down the main con­cepts of cat­e­gory the­ory is a great idea. I’m only wor­ried that your ap­proach throws away too much of what makes the value of cat­e­gory the­ory.

• My sec­ond re­ac­tion, when read­ing the in­tro­duc­tion and the “Path in Graphs” sec­tion, was to feel like ev­ery use­ful part of cat­e­gory the­ory had been thrown away.

I was ex­pect­ing/​hop­ing some­one would say this. I’ll take the op­por­tu­nity to clar­ify my goals here.

I ex­pect “ap­plied cat­e­gory the­ory” is go­ing to be a field in its own right in the not-so-dis­tant fu­ture. When I say e.g. “broader adop­tion of cat­e­gory the­ory is limited in large part by bad defi­ni­tions”, that’s what I’m talk­ing about. I ex­pect prac­ti­cal ap­pli­ca­tions of cat­e­gory the­ory will mostly not re­sem­ble the set-cen­tered us­age of to­day, and I think that get­ting rid of the set-cen­tered view­point is one of the main bot­tle­necks to mov­ing for­ward.

(A good anal­ogy here might be group-the­ory-as-prac­ticed-by-math­e­mat­i­ci­ans vs group-the­ory-as-prac­ticed-by-physi­cists—a.k.a. rep­re­sen­ta­tion the­ory.)

I gen­er­ally agree that the us­age of cat­e­gory the­ory to­day benefits from not think­ing of things as graphs, from us­ing set as a pri­mary ex­am­ple, etc. But to­day, all uses of cat­e­gory the­ory come af­ter the fact. I want that to change, I’ve seen enough to think that will change, and that’s why I’m ex­per­i­ment­ing with a pre­sen­ta­tion which throws out most of the ex­ist­ing value.

• I don’t know if I should feel good or bad for tak­ing the bait… let’s say it make the de­bate progress.

From what you are say­ing, I think I get what you are aiming for. But I am also not sure that this is­sue you see is re­ally there. Be­cause ev­ery text­book on cat­e­gory the­ory that I read pointed in its first chap­ter or sec­tion that you should not use your in­tu­ition for sets in the ab­stract set­ting of cat­e­gory the­ory. Even the his­tor­i­cal ori­gins of cat­e­gory the­ory comes from alge­braic topol­ogy, where the in­fluence of sets and set the­ory is very dimmed.

That be­ing said, maybe there is some “Can­tor bias” in the mod­ern prac­tice of math­e­mat­ics, even when the point is to re­place set the­ory. That’s ac­tu­ally one as­pect of the Physics, Topol­ogy, Logic and Com­pu­ta­tion: A Rosetta Stone pa­per men­tioned in my other an­swer that I like a lot: the au­thors give differ­ent cat­e­gories, and ar­gue that the cat­e­gory best cap­tur­ing ap­pli­ca­tions in Physics and Com­puter Science is not the cat­e­gory of sets, but an­other be­hav­ing differ­ently.

I think I have two ques­tions fol­low­ing your com­ment:

• First, could you ex­pand your ex­am­ple of group the­ory in math vs group the­ory in physics? I know a bit a mathy group the­ory, and I know it is ap­plied to var­i­ous parts of physics, but I’m cu­ri­ous of clear cut differ­ences be­tween the uses.

• Se­cond, does the already ex­ist­ing field of Ap­plied Cat­e­gory The­ory (in­tro pa­per and in­tro book) fits your bill for ap­pli­ca­tions?

• I do like that Rosetta Stone pa­per you linked, thanks for that. And I also re­cently finished go­ing through a set of ap­plied cat­e­gory the­ory lec­tures based on that book you linked. That’s ex­actly the sort of thing which in­forms my in­tu­itions about where the field is headed, al­though it’s also ex­actly the sort of thing which in­forms my in­tu­ition that some key foun­da­tional pieces are still miss­ing. Prob­lem is, these “ap­pli­ca­tions” are mostly of the form “look we can for­mal­ize X in the lan­guage of cat­e­gory the­ory”… fol­lowed by not ac­tu­ally do­ing much with it. At this point, it’s not yet clear what things will be done with it, which in turn means that it’s not yet clear we’re us­ing the right for­mu­la­tions. (And even just look­ing at ap­plied cat­e­gory the­ory as it ex­ists to­day, the defi­ni­tions are definitely too un­wieldy, and will drive away any­one not de­ter­mined to use cat­e­gory the­ory for some rea­son.)

I’m the wrong per­son to write about the differ­ences in how math­e­mat­i­ci­ans and physi­cists ap­proach group the­ory, but I’ll give a few gen­eral im­pres­sions. Math­e­mat­i­ci­ans in group the­ory tend to think of groups ab­stractly, of­ten only up to iso­mor­phism. Physi­cists tend to think of groups as ma­trix groups; the rep­re­sen­ta­tion of group el­e­ments as ma­tri­ces is cen­tral. Physi­cists have fa­mously lit­tle pa­tience for the very ab­stract for­mu­la­tion of group the­ory of­ten used in math; thus the ap­peal of more con­crete ma­trix groups. Math­e­mat­i­ci­ans of­ten use group the­ory just as a lan­guage for var­i­ous things, with­out even us­ing any par­tic­u­lar re­sult—e.g. many things are defined as quo­tient groups. Again, physi­cists have no pa­tience for this. Physi­cists’ use of group the­ory tends to in­volve more con­crete ob­jec­tives—e.g. eval­u­at­ing in­te­grals over Lie groups. Fi­nally, physi­cists al­most always as­cribe some phys­i­cal sym­me­try to a group; it’s not just sym­bols.

• So your path-based ap­proach to cat­e­gory the­ory would be analo­gous to the ma­trix-based ap­proach of group the­ory in physics? That is, re­mov­ing the ab­strac­tion that made us stum­ble into the­ses con­cepts in the first place, and keep­ing only what is of use for our ap­pli­ca­tions?

I would like to see that. I’m not sure that your own propo­si­tion is the right one, but the idea is ex­cit­ing.

• Yup, that’s ba­si­cally the idea.

• It seems like this post was ap­pre­ci­ated mostly by those who already un­der­stand enough of ab­stract math to make sense of it, less so by the un­ini­ti­ated. I’ve only ever had an in­tu­itive un­der­stand­ing of CT, and my rele­vant math level is prob­a­bly just around the first course in alge­braic topol­ogy, which is still some­what higher than the LW av­er­age, yet I still strug­gle to keep my in­ter­est read­ing through your post.

For com­par­i­son, I wrote a very in­for­mal com­ment on how em­bed­ded agent’s mod­el­ing ab­strac­tion lev­els can map into CT con­cepts. Which didn’t get much trac­tion. But maybe start­ing with the ex­am­ples already fa­mil­iar and rele­vant to the au­di­ence could be some­thing to try when in­tro­duc­ing CT to the LW masses.

• Ooh that’s a good com­ment you linked. You men­tion re­lat­ing var­i­ous ap­pli­ca­tions of the pois­son equa­tion via nat­u­ral trans­for­ma­tion; could you un­pack a bit what that would look like? One of the things I’ve had trou­ble with is how to rep­re­sent the sorts of real-world ab­strac­tions I want to think about (e.g. pois­son equa­tion) in the lan­guage of cat­e­gory the­ory; it’s still un­clear to me whether mor­phism re­la­tion­ships are enough to rep­re­sent all the se­man­tics. If you know how to do it with that ex­am­ple, it would be re­ally helpful.

• Try­ing to write it up bet­ter, we’ll see if this will work.

• I am a math­e­mat­i­cian who is us­ing cat­e­gory the­ory all the time in my work in alge­braic ge­om­e­try, so I am ex­actly the wrong au­di­ence for this write-up!

I think that talk­ing about “bad defi­ni­tions” and “con­fus­ing pre­sen­ta­tion” is need­lessly con­fronta­tional. I would rather say that the tra­di­tional pre­sen­ta­tion of cat­e­gory the­ory is perfectly adapted to its origi­nal pur­pose, which is to or­ganise and to clar­ify com­pli­cated struc­tures (alge­braic, topolog­i­cal, ge­o­met­ric, …) in pure math­e­mat­ics. There the ba­sic ex­am­ples of cat­e­gories are things like the cat­e­gory of groups, rings, vec­tor spaces, topolog­i­cal spaces, man­i­folds, schemes, etc. and the no­tion of mor­phism, i.e. “struc­ture-pre­serv­ing map”, is com­pletely nat­u­ral.

As cat­e­gory the­ory is ap­plied more broadly in com­puter sci­ence and the the­ory of net­works and pro­cesses, it is great that new per­spec­tives on the ba­sic con­cepts are de­vel­oped, but I think they should be thought of as com­ple­men­tary to the tra­di­tional view, which is ex­tremely pow­er­ful in its do­main of ap­pli­ca­tion.

• the tra­di­tional pre­sen­ta­tion of cat­e­gory the­ory is perfectly adapted to its origi­nal purpose

I think this is too gen­er­ous. The tra­di­tional way of con­cep­tu­al­iz­ing a given math sub­ject is usu­ally just a minor mod­ifi­ca­tion of the origi­nal con­cep­tu­al­iza­tion. There’s a good rea­son for this, which is that up­dat­ing the already known con­cep­tu­al­iza­tion across a com­mu­nity is a re­ally hard co­or­di­na­tion prob­lem—but this also means that the pre­sen­ta­tion of sub­jects has very lit­tle op­ti­miza­tion pres­sure to­wards be­ing more us­able.

• In group the­ory, a group can be defined ab­stractly as a set with a bi­nary op­er­a­tion obey­ing cer­tain ax­ioms, or con­cretely as a bunch of per­mu­ta­tions on some set (which doesn’t need to in­clude all per­mu­ta­tions, but must be closed un­der com­po­si­tion and in­verse). The two views are equiv­a­lent by Cayley’s the­o­rem, and I think the sec­ond view is more helpful, at least for be­gin­ners.

I don’t know very much about cat­e­gory the­ory, but maybe we could do some­thing similar there? Since ev­ery small cat­e­gory has a faith­ful func­tor into Set, it can be defined as a bunch of sets and func­tions be­tween them. It doesn’t need to in­clude all sets or func­tions, but must be closed un­der com­po­si­tion and in­clude each set’s iden­tity func­tion to it­self.

For ex­am­ple, the di­visi­bil­ity cat­e­gory from the post can be seen as a cat­e­gory of sets like {1,...,n} and func­tions that are uni­tal ring ho­mo­mor­phisms from Z/​mZ to Z/​nZ (of which there’s ex­actly one if n di­vides m, and zero oth­er­wise). And the cat­e­gory of types and func­tions in some pro­gram­ming lan­guage can be seen as a cat­e­gory con­tain­ing some sets of things-with-bot­toms and mono­tone func­tions be­tween them. So in both of these cases, go­ing to sets leads to some nice math.

I’ve heard that the set in­tu­ition starts to break down once you study more cat­e­gory the­ory, but haven’t got­ten to that point yet.

• This is maybe only use­ful to me and a hand­ful of other folks, but I think of cat­e­gory the­ory some­times as a gen­er­al­iza­tion of topol­ogy. Rather than topolo­gies you have cat­e­gories and rather than lift­ings you have func­tors (this is not a strict, math­e­mat­i­cal gen­er­al­iza­tion, but an in­tu­itive gen­er­al­iza­tion of how the pic­ture in my head of how topol­ogy works gen­er­al­izes to how cat­e­gory the­ory works, so don’t come at me that this is not strictly true).

• This is more math­e­mat­i­cally jus­tified than you seem to think. Posets are topolog­i­cal spaces and cat­e­gories, and ev­ery space is weak ho­mo­topy equiv­a­lent to a poset space, which ex­plains why the in­tu­ition works so well.

• My cur­rent im­pres­sion is that broader adop­tion of cat­e­gory the­ory is limited in large part by bad defi­ni­tions, even when more in­tu­itive equiv­a­lent defi­ni­tions are available—“mor­phisms” vs “paths”

Just wanted to note that I re­cently learned some of the very ba­sics of cat­e­gory the­ory and found my­self ask­ing of the pre­sen­ta­tions I came across, “Why are you in­tro­duc­ing this as be­ing about dots and ar­rows and not im­me­di­ately tel­ling me how this is the same as or differ­ent from graph the­ory?”

I had to go and find this an­swer on Math.Stack­Ex­change to ex­plain the re­la­tion­ship, which was helpful.

So I think you’re on the right track to em­pha­size paths, at least for any­one who knows about graphs.

• I also liked the ex­pla­na­tion of nat­u­ral trans­for­ma­tions as be­ing about prisms, I found that image helpful.

• e.g. “colou*r” matches “color” or “colour” but not “pink”.

Is this cor­rect? I’d have thought “colo*r” matches to both “color” and “colour”, but “colou*r” only to “colour”.

Next-most com­pli­cated

Least com­pli­cated?

I’m very likely to read ev­ery post you write on this topic – I’ve got­ten this book a while ago, and while it’s not a pri­or­ity right now, I do in­tend to read it, and hav­ing two differ­ent sources ex­plain­ing the ma­te­rial from two ex­plic­itly differ­ent an­gles is quite nice. (I’m men­tion­ing this to give you a an idea of what kind of au­di­ence gets value out of your post; I can’t judge whether it’s an an­swer to your cat­e­gory re­source ques­tion, al­though it seems very good to me.)

I ini­tially thought that the clouds were meant to de­pict matches and was won­der­ing why it wasn’t what I thought it should be, be­fore re­al­iz­ing that they always de­pict the same stuff and were meant to de­pict “all stuff” be­fore we figure out what the matches are.

• Is this cor­rect? I’d have thought “colo*r” matches to both “color” and “colour”, but “colou*r” only to “colour”.

It’s cor­rect. Some pat­tern match­ers work the way you de­scribe, but in a reg­u­lar ex­pres­sion “u*” matches “zero or more u char­ac­ters”. So “colou*r” matches “color”, “colour”, “colou­u­u­u­u­u­u­uur”, etc.

(In this case one would typ­i­cally use “colou?r”; “u?” matches “ex­actly zero or one u char­ac­ters”, that is, “color”, “colour”, and noth­ing else.)

• I didn’t read all the way through (I stopped read­ing mid­way through the ex­tended air­port ex­am­ple), so for­give me if this is an­swered already; if you say the an­swer’s in there, I’ll go back and reread. But, in case it’s not, my ques­tion is: what would I gain from us­ing cat­e­gory the­ory for prob­lems like this, in­stead of graph the­ory (on which there already ex­ists a vast liter­a­ture)?

• I think, rather than “cat­e­gory the­ory is about paths in graphs”, it would be more rea­son­able to say that cat­e­gory the­ory is about paths in graphs up to equiv­alence, and in par­tic­u­lar about prop­er­ties of paths which de­pend on their re­la­tions to other paths (more than on their re­la­tion­ship to the ver­tices)*. If your prob­lem is most use­fully con­cep­tu­al­ized as a ques­tion about paths (find­ing the short­est path be­tween two ver­tices, or count­ing paths, or some­thing in that genre, you should definitely look to the graph the­ory liter­a­ture in­stead)

* I re­al­ize this is to­tally in­com­pre­hen­si­ble, and doesn’t make the case that there are any in­ter­est­ing prob­lems like this. I’m not try­ing to ar­gue that cat­e­gory the­ory is use­ful, just clar­ify­ing that your in­tu­ition that it’s not use­ful for prob­lems that look like these ex­am­ples is right.

• Great ques­tion. I don’t think I an­swer it out­right, but the sec­tion on nat­u­ral trans­for­ma­tions should at least offer some in­tu­ition for the kind of ques­tions which cat­e­gory the­ory looks at, but which graph the­ory doesn’t re­ally look at. That doesn’t an­swer the ques­tion of what you’d gain, and frankly, I have yet to see a re­ally com­pel­ling an­swer to that ques­tion my­self—cat­e­gory the­ory has an awful lot of defi­ni­tions, but doesn’t seem to ac­tu­ally do much with them. But I’m still pretty new to this, so I’m hold­ing out hope.

• Cat­e­gory the­ory pro­vides a the­o­ret­i­cal struc­ture that fits many (all) fields of math­e­mat­ics. Some non-triv­ial in­sights from it are that:

• The no­tion of ‘in­vert­ible’ de­pends on the con­text (speci­fi­cally, it needs to let you re­cover the iden­tity path, which need not be a set-the­o­retic iden­tity func­tion. I think this is what Eigil is say­ing above. As an ex­am­ple, con­sider mea­sure the­ory where we in­tro­duce ‘equal up to sets of mea­sure 0’).

• The prop­er­ties of an ob­ject are en­coded in the maps we al­low to/​from the ob­ject (as a weak ex­am­ple, a topol­ogy in­duced by a col­lec­tion of maps. As a strong ex­am­ple: Yoneda’s Lemma).

• Some prop­er­ties or con­struc­tions are uni­ver­sal in math­e­mat­ics (di­rect prod­ucts, di­rect limits, in­verse limits, ini­tial ob­jects, fi­nal ob­jects), and study­ing these in a gen­eral set­ting pro­vides a lot of in­sight in ac­tual ap­pli­ca­tions.

Also per­son­ally I’ve had great mileage out of notic­ing when my wild ideas could not be cast in cat­e­gory-the­o­retic form, which is a huge red flag and let me iden­tify er­rors in my rea­son­ing quickly and cleanly.

• The prob­lem is that these are all just defi­ni­tions; they don’t ac­tu­ally say any­thing about the things they’re defin­ing, and the ex­tent to which real-world sys­tems fit these defi­ni­tions is not ob­vi­ous.

• Ok, we can define in­vert­ibil­ity in a way that de­pends on the con­text, but what does that ac­tu­ally buy us? What’s a prac­ti­cal ap­pli­ca­tion where we say “oh, this is a spe­cial kind of in­verse” and thereby prove some­thing we wouldn’t have proved oth­er­wise?

• “The prop­er­ties of an ob­ject are en­coded in the maps we al­low to/​from the ob­ject” is not a fact about the world, it is a fact about what sort of things cat­e­gory the­ory talks about. One of my big open ques­tions is whether “maps to/​from the ob­ject” are ac­tu­ally suffi­cient to de­scribe all the prop­er­ties I care about in real-world sys­tems, like a pot of wa­ter.

• Univer­sal con­struc­tions are cute, but I have yet to see a new in­sight from them—i.e. a fact about a par­tic­u­lar sys­tem which wasn’t already fairly ob­vi­ous. Heck, I have yet to even see a gen­eral the­o­rem about a uni­ver­sal con­struc­tion other than “if it ex­ists, then it’s unique up to iso­mor­phism”.

Th­ese aren’t rhetor­i­cal ques­tions, btw—I’d re­ally ap­pre­ci­ate meaty ex­am­ples, it would make learn­ing this stuff a lot less frus­trat­ing.

• I’ll give it the good old col­lege try, but I’m no ex­pert on cat­e­gory the­ory by any means. I’ve always been told that a real-world ap­pli­ca­tion of cat­e­gory the­ory is kind of an in-joke; you’re not al­lowed to say they don’t ex­ist, but no­body has any. The pur­pose of cat­e­gory the­ory is to re­place (naive) set the­ory as a foun­da­tion of math­e­mat­ics. It’s there­fore not re­ally sur­pris­ing that this is far re­moved from de­scribing prop­er­ties of a pot of wa­ter, for ex­am­ple.

I think in light of this my pre­vi­ous bul­let points weren’t that hor­rible. If you ask some­thing like “I’m walk­ing down the street, sud­denly I see a house on fire. How does cat­e­gory the­ory help me de­cide what to do next?” the an­swer is “it does not, not even the slight­est bit”. In­stead it helps on a very ab­stract meta-level: it is there to struc­ture the thoughts you have about fields of math­e­mat­ics, which in turn will let you gain faster and deeper in­sight into those fields. Often (for ex­am­ple in alge­bra) even this will not trans­late to real-world ap­pli­ca­tions, but you can then use those fields to bet­ter ab­sorb some­thing to make pre­dic­tions about the real world. As an ex­am­ple I’m think­ing of Cat­e­gory the­ory helping you learn Group the­ory which helps you learn Renor­mal­iza­tion the­ory which gains you real-world in­sight into com­pli­cated many-par­ti­cle sys­tems, like a gas out of ther­mo­dy­namic equil­ibrium. If you are look­ing for any­thing more di­rect than this I flat out don’t have an an­swer for you.

Fi­nally a bit about the bul­let points:

• The study of non-in­vert­ible lin­ear op­er­a­tors on in­finite di­men­sional vec­tor spaces led to no­tions of spec­trum, and later the Gelfand-Naimark rep­re­sen­ta­tion and study of Fred­holm op­er­a­tors, which are some of the core deep ideas be­hind math­e­mat­i­cal for­mu­la­tions of quan­tum me­chan­ics (amongst other fields). Also the differ­ence be­tween [a lin­ear map failing to be an iso­mor­phism be­cause it is not bi­jec­tive] (re­lated to the so-called Point spec­trum) and [a lin­ear, bi­jec­tive map that is still not an iso­mor­phism be­cause its in­verse map is not a mor­phism] (re­lated to the so-called Essen­tial spec­trum) is of great im­por­tance when us­ing nu­mer­i­cal simu­la­tions to de­ter­mine be­havi­our of differ­en­tial equa­tions. In par­tic­u­lar, the point spec­trum de­pends on the im­ple­men­ta­tion and the es­sen­tial spec­trum does not. As far as I know this is a topic of in­tense de­bate in simu­la­tions of fluid/​air flow and tur­bu­lence.

• This is a fair point, and I don’t have an an­swer for you. I do briefly want to re­mark that (to my knowl­edge) all of math­e­mat­ics, both mod­ern and old, falls un­der the um­brella of cat­e­gory the­ory some­how. So the claim “I might be look­ing for prop­er­ties that this the­ory can­not cap­ture” has a high bur­den of proof.

• Per­son­ally I run into this reg­u­larly—state­ments like “the topol­ogy gen­er­ated by this norm is equiv­a­lent to the product topol­ogy from these un­der­ly­ing spaces, there­fore we now have the fol­low­ing (uni­ver­sal) prop­er­ties”. On the other hand I don’t have a gen­eral the­o­rem about uni­ver­sal con­struc­tions for you that says some­thing very ex­cit­ing. I think uni­ver­sal con­struc­tions are more about pre-caching knowl­edge, so that you may im­me­di­ately use a range of (in your words fairly ob­vi­ous) re­sults af­ter you ver­ify some com­monly oc­cur­ring con­di­tions.

• What seems the ad­van­tage to me is that cat­e­gory the­ory lets you ac­count for more stuff within the for­mal­isms.

Case in point, I did my dis­ser­ta­tion in graph the­ory, and what I can tell you from prov­ing hun­dreds of state­ments about graphs is that most of the the­ory of how and why graphs be­have cer­tain ways ex­ists out­side what can be cap­tured for­mally by the the­ory. This is of­ten what frus­trates peo­ple about graph the­ory and all of com­bi­na­torics: the proofs come gen­er­ally not by piling lots of pre­vi­ous re­sults but by see­ing through to some fun­da­men­tal truth about what is go­ing on with the col­lec­tion (“cat­e­gory”) of graphs you are in­ter­ested in and then us­ing one of a hand­ful of proof tricks (“func­tors”) to trans­form one thing into an­other to get your proof.

• Now we just take our pat­tern and plug it into our pat­tern-matcher, as usual.

Pre­sum­ably, the pat­tern is the query cat­e­gory. What is the tar­get cat­e­gory? (not to be con­fused with the part of the pat­tern you called tar­get—use differ­ent names?)

• Sorry, there’s a few too many things here which need names. The pat­tern (origi­nal cat­e­gory + copy + etc) is the query; the tar­get is what­ever cat­e­gory we like. We’re look­ing for the origi­nal cat­e­gory and a model of it, all em­bed­ded within some other cat­e­gory. I should prob­a­bly clar­ify that in the OP; part what makes it in­ter­est­ing is that we’re look­ing for a map-ter­ri­tory pair all em­bed­ded in one big sys­tem.

UPDATE: changed the source-tar­get lan­guage to sys­tem-model for nat­u­ral trans­for­ma­tions.

• Nat­u­ral trans­for­ma­tions can be com­posed (in two ways) - how does your for­mu­la­tion ex­press this?

• Good ques­tion, I spent a while chew­ing on that my­self.

Ver­ti­cal com­po­si­tion is rel­a­tively sim­ple—we’re look­ing for two nat­u­ral trans­for­ma­tions, where pat­tern-tar­get of one is pat­tern-source of the other. Equiv­a­lently, we’re pat­tern-match­ing on a pat­tern which has three copies of the origi­nal cat­e­gory stacked one-on-top-the-other, rather than just two copies.

Hori­zon­tal com­po­si­tion is trick­ier. We’re tak­ing the pat­tern for a nat­u­ral trans­for­ma­tion (i.e. two copies of the origi­nal cat­e­gory) and us­ing that as the origi­nal cat­e­gory for an­other nat­u­ral trans­for­ma­tion. So we end up with a pat­tern con­tain­ing four copies of the origi­nal cat­e­gory, con­nected in a square shape, with ar­rows go­ing from one cor­ner (the pat­tern-source for the com­pos­ite trans­for­ma­tion) to the op­po­site cor­ner (the pat­tern-tar­get for the com­pos­ite trans­for­ma­tion).

• I think the query cat­e­gory is the pat­tern, as you say, and the tar­get cat­e­gory is [origi­nal cat­e­gory + copy + edges be­tween them]. That way, if the match­ing pro­cess re­turns a match, that match cor­re­sponds to a path-that-is-equiv­a­lent-to-the-path-in-the-query-cat­e­gory.

• But the pat­tern was already defined as [origi­nal cat­e­gory + copy + edges be­tween them + path equiv­alences] :(

• I be­lieve query and tar­get cat­e­gory are the same here, but af­ter read­ing it again, I see that I don’t fully un­der­stand the re­spec­tive para­graph.

• David Spi­vak offers an ac­count of Cat­e­gories as database schemas with path equiv­alen­cies that is similar to the ac­count you’ve given here in his book Cat­e­gory The­ory for the Sciences. He still pre­sents the tra­di­tional defi­ni­tions, giv­ing ex­am­ples mainly from the cat­e­gory of sets and func­tions. I also didn’t find his pre­sen­ta­tion of database schema defi­ni­tion es­pe­cially easy to un­der­stand, but it is very use­ful when you re­al­ize that a func­tor is a sys­tem­atic mi­gra­tion of data be­tween schemas.