Sets and Functions

Sets and functions are two of the most basic ideas in mathematics, and we’ll need to know what they are to discuss some things about categories rigorously. Normally you’d learn about sets and functions way before encountering category theory, but in the spirit of assuming as little math as possible, we should write this post. It’s also worth addressing a few matters for their conceptual relevance.

Sets are imaginary bags we put things into. For example, you can take a dog, a cat, and a shoe, put them in an imaginary bag, and now you have a set consisting of ${d o g, c a t, s h o e}$ . The members of the set—dog, cat, and shoe—are called the elements of the set.

A subtle but important aspect of a set is that the imaginary bag has to be defined by a rule. This rule can be pretty much anything, like “put into a bag everything I’m pointing at,” but it does have to be a rule. Typically, sets can fit pretty much anything in, and so you can often just say “here is my set” rather than having to be explicit about the rule. We’ll get back to why the rule matters at the end. For now, sets are imaginary bags that you can put pretty much anything into.

What are we putting into these bags, exactly? Pretty much anything, yes—but clearly we aren’t actually putting dogs, cats, and shoes into bags. Mathematically, what are these things?

That is to say, what’s the difference between the set ${d o g}$ and the set ${c a t}$ ?

Well, what’s the difference between the equations $x + 2 = 3$ and $y + 2 = 3$ ? Nothing but the name of the variable—which does not matter at all. We could call $x$ anything. We could represent it with a thirty-foot tall watercolor of a fire truck.

So what’s the difference between the set {dog} and the set {cat}? Only the name of the element—which does not matter at all. Just like we can take posets like $1 \to 2 \to 3$ and $a \to b \to c$ and represent their common structure abstractly as $∙ \to ∙ \to ∙$ , we can do the same for ${d o g}$ and ${c a t}$ with this set: ${∙}$ .

The set ${∙}$ is what a one-element set like {dog} or {cat} really is. There’s no mathematical meaning to ${d o g}$ that actually makes the element of this set a four-legged barking creature. It’s just an element that happens to be labeled “dog.”

So why do we care about sets? Set theory is really important to mathematics, but from a category-theoretic perspective, they actually aren’t very important at all. Instead, sets serve one major purpose: sets let us define functions, and functions are really, really important!

Functions are maps between sets that meet a few rules. First of all, let’s just talk about the “maps” part of that. Think of sets as countries, and the elements of the sets as cities in that country. A map between the two sets is a map telling you how to go from the cities in one country to the cities in the other country.

But of course, math is a little more specific than that. So let’s say you have one set $A = {a, b, c}$ and another set $X = {x, y, z}$ . What does it mean to define a map—a morphism, in fact—from $A$ to $X$ ?

Well, it’s pretty simple in the end. You have to start from a city in $A$ , so one of $a, b$ , or $c$ . And you have to end up in a city in $X$ , so one of $x, y,$ or $z$ . Let’s say you go from $a$ to $x$ . Then the map is just...the set of where you started from and where you ended up. That is, $a$ and $x$ , respectively.

That’s it! It’s a short trip. There’s not much to sets in the first place, so there’s not much to maps between them. (Technically, sets have no structure—they’re imaginary bags filled with black dots—and so the maps between them are as simple as a map across a country with no geography.) But let’s point out one thing: we do need this map to tell us where we started and where we ended. In a set, the order of the elements doesn’t mean anything. For example, the set ${a p p l e, b a n a n a, o r a n g e}$ and the set ${o r a n g e, a p p l e, b a n a n a}$ are literally identical. The only reason the elements are even written in an order at all is because there’s no other way to display text.

To get our elements in order, so that we can show where we started from and where we ended up, we use something called an ordered pair, which you’ve probably seen from doing Cartesian coordinates. When we have a map telling us to go from $a$ to $x$ , we represent that as an ordered pair $(a, x)$ . The ordered pair means “we started at $a$ and ended up at $x$ .”

Although sets don’t have orders, we can have sets of ordered pairs (since we can put pretty much anything into sets), and in that way we can represent order in a set. For example, you can have the set consisting of just the ordered pair $(a, x)$ . That would be written ${(a, x)}$ .

So what does it mean to define a map from $A$ to $X$ ? It means defining a set of ordered pairs, the first part of the ordered pair coming from the set $A$ and the second part of the ordered pair coming from the set $X$ .

That is to say, a map $f : A \to X$ is defined as a set whose elements are ordered pairs $(∙_{A}, ∙_{X})$ , where $∙_{A}$ is an element in $A$ and $∙_{X}$ is an element in $X$ .

So for example, we could have a map with the following directions: ${(a, x), (b, y), (c, z)} .$ This maps says, “If you’re in $a$ , go to $x$ . If you’re in $b$ , go to $y$ . If you’re in $c$ , go to $z$ .” All such maps are called binary relations, because they define relationships between pairs of things. For example, the map just given defines relationships between $a$ and $x$ , $b$ and $y$ , and $c$ and $z$ .

We could define all sorts of maps based on this definition. You could make “partial” maps that tell you where to go if you’re in $a$ , but not if you’re in $b$ or $c$ . You could make “choose your own adventure” maps that have forking directions, e.g., a map having both $(a, x)$ and $(a, y)$ in them.

What’s the best map? That is unquestionably a special kind of binary relation known as a function. “Best” might be a strong word, but functions have two special properties that have made it the most important type of map, and indeed morphism, in all of mathematics.

The first property of functions is that they provide you instructions for going from $A$ to $X$ for every “city” in $A$ . Let’s move away from the countries-and-cities metaphor and consider the study of science. Think of the elements of $A$ as possible states of the world. As scientists, we’d like a scientific rule that gives us predictions for every possible state of the world, i.e., something that provides a map for every element of $A$ . That’s something a function does—this property is called total.

The second property of functions is something called univalence, which means that you only get one mapping for every city you could be starting from. That is to say, if your function tells you to do $(a, x)$ and $(a, y)$ , it must be the case that $x$ and $y$ are completely identical, i.e., $x = y$ . Having a map to two different cities starting from the same city is strictly disallowed by univalence.

Let’s relate univalence to science as well. Basically, it captures the idea of determinism. If a state of the world yields a particular output, and then you reset things to that exact state of the world again, it had better yield the same output again. I.e., if you have a state of the world $a$ , and you observe that $a$ yields output $x$ and also yields output $y$ , the outputs $x$ and $y$ had better be the same output that accidentally got two different labels applied to it.

So between totality and univalence, a function basically captures the idea of “making deterministic predictions for everything.” Which is exactly what science is all about.

We can combine totality and univalence into this single rule: A function $f : A \to X$ is a set of ordered pairs $(∙_{A}, ∙_{X})$ where each element of $A$ shows up in one and only one ordered pair. That is to say, $a$ is definitely in an ordered pair, as guaranteed by totality. But by univalence, it will only show up once: if you have $(a, x)$ , then you won’t also have $(a, y)$ .

You should know that while a function can’t give you $(a, x)$ and $(a, y)$ unless $x = y$ , it can totally give you $(a, x)$ and $(b, x)$ . That would just be saying that two states of the world end up at the same point, which certain seems like a scientific possibility.

Now we’re going to address sets and functions from a category theoretic perspective. In fact, we’re going to make a category.

***

Let’s build a category whose objects are sets and whose morphisms are functions.

The first step is to make sets into objects. We do this by saying that sets are objects. Seriously, it’s that simple—objects don’t really have any properties at all aside from their morphisms, so there’s nothing more to this step.

The next step is to make our functions into morphisms. We do this by saying that functions are morphisms, and then we check that functions obey the rules of morphisms in a category.

First, domain and codomain. Sets serve as the domain and codomain of functions, and since sets are our objects, the functions will clearly have the objects of this category as their domain and codomain.

So the first real thing to figure out is composition. Let’s say we have $A, B,$ and $C$ as sets, and functions $f : A \to B$ and $g : B \to C$ . Composition would require that we have another function $h : A \to C$ such that $h = g \circ f$ .

Let’s break this requirement down. The function $f$ starts with the elements in $A$ and returns some elements in $B$ . That is to say, we have a set of ordered pairs of the sort $(a, b)$ , where $a$ comes from $A$ and $b$ comes from $B$ . Say that $A$ consists of ${a_{1}, a_{2}, a_{3}}$ and B is ${b_{1}, b_{2}, b_{3}, b_{4}}$ . The function $f$ allows us to specifically assign an element in $B$ to each element of $A$ . That is to say, we can ask, what is the specific (i.e., only one) element of $B$ corresponding to $a_{1}$ ? It could be $b_{2}$ , for example. If so, then $f (a_{1}) = b_{2}$ . And we can ask the question next for $a_{2}$ and $a_{3}$ . We might reuse elements of $B$ or not. Let’s say the full function $f$ gives ${(a_{1}, b_{2}), (a_{2}, b_{2}), (a_{3}, b_{1})}$ .

Having assigned elements of $B$ to elements of $A$ in this way, we could think of elements of $A$ as “hiding in” elements of $B$ . For example, $a_{1}$ is hiding in $b_{2}$ . (That is to say, the function f is hiding $a_{1}$ in $b_{2}$ —it doesn’t get to hide there automatically.)

Next we have $g : B \to C$ , which specifically assigns an element in $C$ to each element of $B$ . Say that $C$ consists of ${c_{1}, c_{2}}$ . Let’s say $g$ gives ${(b_{1}, c_{2}), (b_{2}, c_{1}), (b_{3}, c_{1}), (b_{4}, c_{1})}$ .

Now let’s reveal our hidden soldiers. The elements $a_{1}$ and $a_{2}$ were hiding in $b_{2}$ , and $a_{3}$ was hiding in $b_{1}$ . Naturally, they ambush the soldiers of $C$ like so: ${(a_{1}, c_{1}), (a_{2}, c_{1}), (a_{3}, c_{2})}$ .

(Why is this ambush allowed? Because $(a_{1}, b_{2})$ means $f (a_{1}) = b_{2}$ , and $(b_{2}, c_{1})$ means $g (b_{2}) = c_{1}$ . Substituting $f (a_{1})$ for $b_{2}$ , we have $g (f (a_{1})) = c_{1}$ .)

Is that “ambush” set a function? Yes, it has each element of $A$ in an ordered pair, and each element is in only one pair. Is it a function $A \to C$ ? Yes, the “first part” of each ordered pair comes from $A$ and the “second part” of each comes from $C$ . Can we call this function $h$ ? Yes, we just label it that for free. Is this function $h$ the same as doing $f$ first, and then $g$ ? Yes, that’s exactly what we just saw.

So functions compose. Check mark.

Now let’s prove that these functions compose associatively. Say you have functions $D i \to E j \to F k \to G$ . We want to show that $(k \circ j) \circ i = k \circ (j \circ i)$ .

Let’s plug in an arbitrary element $d$ of $D$ . Our functions carry that element through to some element of $G$ , and we want to know if it’s the same regardless of how you group the functions. So let’s see if

(k \circ j) \circ i (d) = k \circ (j \circ i (d)) .

We know that composition is allowed (we’re proving that composition is associative, after all). So let’s compose. Say that $k \circ j = y : E \to G$ and $j \circ i = x : D \to F$ . Now we can simplify to asking if $y \circ i (d) = k \circ x (d)$ .

Well, what is $y \circ i (d)$ ? It’s a mapping of an element from $D$ through $i$ to $E$ , and from $E$ through $y$ to $G$ . And $y$ is equal to the path that going through from $E$ through $j$ to $F$ , and from there through $k$ to $G$ . So overall, $y \circ i (d)$ maps an element from $D$ through $i$ to $E$ , through $j$ to $F$ , and through $k$ to $G$ .

And what is $k \circ x (d)$ ? It’s a mapping of an element from $D$ through $x$ to $F$ , and from $F$ through $k$ to $G$ . And since $x$ is equal to the path that goes from $D$ through $i$ to $E$ , and from there through $j$ to $F$ . So overall, $k \circ x (d)$ maps an element from $D$ through $i$ to E, through $j$ to $F$ , and through $k$ to $G$ .

Which is exactly what we just said about $y \circ i (d)$ . Because they’re both carrying the same element through the same functions, they have to end up at the same element in $G$ on pain of violating univalence. So they’re equal for the case of $d$ , and since $d$ is an arbitrary element in an arbitrary set, it works in general. Thus, composition is associative.

Finally we need each set to have an identity morphism. Because our morphisms are functions, this will be the identity function. This is as simple as asking whether, for any set $A$ , we can define a function $1_{A} : A \to A$ that does nothing.

Here’s an example of such a function. Say $A = {a_{1}, a_{2}, a_{3}}$ . Then a function $A \to A$ would do nothing if it gave ${(a_{1}, a_{1}), (a_{2}, a_{2}), (a_{3}, a_{3})}$ . That is to say, each element just gets mapped back to itself.

Let’s show that does nothing. I.e., if you have a function $g : A \to B$ , the composition $1_{A} \circ g$ is just equal to $g$ . Well, duh! The function $g$ is a mapping of elements from $A$ to $B$ . So if you keep all the elements where they were in $A$ , then $g$ is just going to be what $g$ was going to be if you didn’t do anything in the first place.

Obviously, you can just pair up each element with itself for any set, and that’s going to keep not doing anything no matter how big the set gets. So every set has an identity function. (And only one, at that—there’s only one way to pair up each element with itself.)

And now we’re done. We’ve just shown how sets and functions can be considered a category: sets are objects by declaration, and we can show that functions between sets compose associatively, and an identity function can be defined for every set.

Neat!

You might be wondering if we could have defined a category with sets as objects and binary relations as morphisms. Indeed we can, in pretty much the same way as we did with sets and functions. In fact, since functions are just particular types of binary relations, proving that binary relations meet the rules of a category in general would have proved it for the specific case as well. In fact, the category of sets and functions is a subcategory of the category of sets and binary relations.

Yet it is the case that the category of sets and functions gets the significant name Set, whereas the category of sets and binary relations gets the much less interesting name Rel. That’s because it is the category-theoretic perspective that everything that’s interesting about a category is contained in its morphisms. Functions are basically the most important morphism in mathematics, so we give the category for which functions are morphisms the name Set—we care about sets because, more than anything, they let us define functions.

***

One last note on defining the category of sets, Set. You may have heard of Russell’s paradox, which says you can’t have the set of all sets. That’s because sets have to be defined according to a rule, as we said in the beginning. What if you try to define a set of all sets that are not elements of themselves? Then one possibility is that this set is either an element of itself, in which case it is included, in which case we have violated the rule just laid down. But the only other possibility is that it is not an element of itself, in which case it should be an element of itself according to its rule. But we just saw why we can’t let it be an element of itself. So we bounce back and forth in eternal paradox.

So we can’t have a set of all sets. Then how can we have a category of all sets? We’ll discuss that in the next post, which will address this issue and use our new understanding of sets to more properly formalize the definition of a category given a couple of posts ago.

Additionally, we’ll look at two other interesting things about sets. We’ll see how, although sets are bags of black dots, we can use functions to give those black dots meaning. It’s the category-theoretic view that everything you want to know about an object is found in the object’s morphisms. Furthermore, we’ll also see that there’s two special kinds of sets which can be though of as the “minimal” and “maximal” set, respectively. In doing so, we’ll get our first tastes of the two main goals of this series, the Yoneda lemma and adjoint functors.

***

Incidentally, writing these posts has been shockingly exhausting and time-absorbing. So after this next post, I don’t expect anything further on the weekend, although I may try to answer comments then. Five posts a week is not sustainable. 2-3 a week is probably more reasonable. This experience has given me a lot of respect for people who keep up daily blogs or write books. Thanks very much to people reading and commenting on these so far. It’s very useful to have feedback and gratifying to know that people are interested in category theory.

I am going to work as well on lowering the cost to myself of creating drawings to aid explanations. I have always explained this material in the past with easy access to pen and paper, and somehow it never quite occurred to me that even sketching out a few dots and arrows is much more of a pain on a computer. Taking recommendations, if you have any.