Recreating logic in type theory

This is the second entry in my series on type theory. In this post, we will talk about how to recreate logic in type theory.

As you might know, in order to define ZFC set theory, you first define first-order (FO) logic separately, and then specify the ZFC axioms in the language of FO logic. But we want to be more ambitious here! We want to encode FO logic with the type concepts we already have! If that doesn’t make any sense right now, I hope at the end of this article it does.

But before we get there, a quick refresher on type-valued functions.

Type-valued functions

In the definition of sum types, we briefly mentioned that $B (t)$ is a type-valued function:

(t, x) : \sum t : A B (t) .

We can of course precisely specify $B$ ’s type:

B : \prod t : A U .

What is maybe surprising about this is that an instance of a type, like the function $B$ , becomes part of a type. But this is how it works in type theory. In fact, we could have written the type of all groups a little differently:

\begin{matrix} f (T) & :\equiv (\prod x : T \prod y : T T) \times T G r o u p & :\equiv \sum T : U f (T) \end{matrix}

The function $f$ here doesn’t only return a type; it also takes a type as input! A function on types can use the operators $\sum$ , $\prod$ , $+$ , and $\times$ , and can get quite complicated. And, given any function $f$ with codomain $U$ (and arbitrary domain $X$ ), the following is a type: $\sum_{x : X} f (x)$ , namely, the disjoint union over all types in the image of $f$ .

What to use for truth values?

OK, why do I mention all this? Because it will be relevant to how we will represent truth values (booleans in programming languages) in our encoding of logic.

The truth values in first-order logic are, shockingly, true and false. Propositions are statements that have a truth value – they’re either true or false. (So, boolean variables/expressions in a programming language.)

And as mentioned in the beginning, we can talk about subsets of, say, $N$ by talking about predicate functions (or just predicates) on $N$ . A predicate function is a function that takes a single argument and returns a truth value – true or false.

So, you see, truth values are needed everywhere. How to represent them?

Our first naïve approach might perhaps be to represent “true” and “false” with the values of the type $2$ , which is the type that has exactly two elements. Let’s call them $0_{2}$ and $1_{2}$ (distinct from $0 : N$ and $1 : N$ ). So, let’s say $0_{2}$ represents “false” and $1_{2}$ represents “true”.

A predicate function $ϕ$ on the natural numbers would then have the type:

ϕ : \prod n : N 2 .

So, if $ϕ$ represents some property we care about, like whether a number is even, and we want to prove that the number 6 is even, then what we want to show at the end is

ϕ (6) \equiv 1_{2} .

However, doing it like this doesn’t really play to the strengths of type theory. We would have to manually define all the logic operations on $2$ , which isn’t really better than how it’s done in set theory. We would like to re-use our type constructs in the encoding of logic.

So, what will we do instead?

Types for truth values

We will not use two values for “true” and “false”, but two types! Of course! Specifically, the empty type and the unit type.

The empty type has, as the name suggests, no elements at all. It can be represented by $0$ (a bold zero) or by $⊥$ . In a computer program, this could be seen as the return type of a function that crashes and never returns at all. In some programming languages, this type is called “Never”. (Note that this is different from the “nil” or “None” types in programming languages; those correspond to the unit type which we’ll talk about below.) The defining quality of the empty type is that for any other type $T$ , there exists a unique function mapping from $0$ to $T$ :

{r e c}_{0} : \prod T : U \prod x : 0 T

And this function is the “empty function” – there is no $x : 0$ after all. The empty function is quite weird, but unfortunately we will encounter it quite a lot. It just maps nothing to nothing. If you think of a function as a list of input-output pairs, then the empty function is a list with zero elements.

The unit type has exactly one element (I mentioned it once before) and is written $1$ (a bold one) or $⊤$ . The one element that is in $1$ is written as “ $⋆$ ” or “()”; the latter is meant to be a tuple without any entries.^[1] We will use “ $⋆$ ” because we have enough parentheses as it is. The unit type has the property that for any other type $A$ , there exists a unique function from $A$ to $1$ (so exactly the opposite of the property of the empty type).^[2] That function is the constant function which returns $⋆$ for all inputs, $(λ x . ⋆)$ . We won’t make much use of this property though.

A predicate is then a type-valued function (hence the refresher!) on some domain $X$ :

ϕ : \prod x : X U .

More specifically, a predicate function returns either $0$ or $1$ . Note: mapping to either $0$ or $1$ is very different from mapping to an element of either $0$ or $1$ ; after all, $0$ doesn’t have any elements.

And if $ϕ$ is again the predicate for even numbers, then we’ll have, for example, $ϕ (6) \equiv 1$ . Now, this might look suspiciously similar to our “naïve” solution based on $2$ , so what is the advantage? To unlock those advantages, we’ll note that we can express $ϕ (6) \equiv 1$ also as $⋆ : ϕ (6)$ , because if $⋆$ is an element of the type $ϕ (6)$ , then $ϕ (6)$ must be the unit type!

So, if $P$ is some proposition, then we will encode the logic statement

P is true.

⋆ : P is well-typed.

in our type theory, where the proposition $P$ is a type-valued expression. This concept is known as propositions-as-types. If $⋆ : P$ is not well-typed, then $P$ should be interpreted as a false statement.

At this point, I’ll remind you that the point of a logic system is to make truth-preserving transformations. So, if we represent truth with the fact that $⋆$ is a member of the type that represents the proposition, then all our inference steps need to preserve that property or our logic system is useless. We will show in the following that we can make our logic truth-preserving.

I can understand if this feels a bit mind-bending right now. But I hope that you will see shortly that this all makes sense and isn’t a terrible idea. In fact, it will be rather elegant.

Existential quantifier

Now that we have decided how to represent true and false propositions in our theory, we would also like to use the other parts of first-order logic, like existential quantification. Can we again do something really clever with types here?

Let’s see… Say that $ϕ$ is a predicate on the natural numbers with type $\prod_{n : N} U$ . It might represent evenness as before. As $ϕ$ is a type-valued function, we can construct a sum type from it:

(n, p) : \sum n : N ϕ (n) .

Now, what does it mean if there exists some pair $(n, p)$ such that the above is well-typed? Or in other words: what does it mean if the above sum type is inhabited? (We say that a type is inhabited if it has at least one element.)

As you’ll recall, the elements of sum types are pairs. In this case, the first element of the pair is a natural number $n$ , and the second element is an element of either $0$ or $1$ , depending on whether $ϕ (n)$ is $0$ or $1$ . However… $0$ doesn’t have any elements! So actually, all elements of $\sum_{n : N} ϕ (n)$ must have $1$ as the type of the second entry of the pair. In other word, a pair of type $\sum_{n : N} ϕ (n)$ can only exist for those $n : N$ , where $ϕ (n) \equiv 1$ . For those $n$ , the pair is given by $(n, ⋆)$ , where the second entry is $⋆ : 1$ .

So, if I can find some $n$ such that

(n, ⋆) : \sum n : N ϕ (n)

is well-typed, then this must mean that there exists an $n$ such that $⋆ : ϕ (n)$ is well-typed, or equivalently, there exists an $n$ such that $ϕ (n)$ is inhabited.

It’s worth repeating again: if the type $\sum_{x : T} ϕ (x)$ is inhabited, then there exists an $x : T$ such that $ϕ (x)$ is inhabited. So, in a way, the $\sum_{x : T} (\dots)$ operator is acting like an existential quantifier: $\exists (x : T) . (\dots)$ .

However, to really make this work, we’ll have to change our definition of truth a bit.

If we say, “as long as a type is inhabited at all, it represents a true proposition,” then the step from $ψ (y)$ to $\sum_{x : T} ψ (x)$ is truth-preserving. Because if $ψ (y)$ is true (i.e., is inhabited) for some $y : T$ , then $\sum_{x : T} ψ (x)$ is also true (inhabited). We have successfully reasoned that if I have a concrete $y : T$ with property $ψ$ , then I can conclude that there exists an element in $T$ for which $ψ$ holds. This rule is known as existential introduction in first-order logic.

So, in the propositions-as-types system, the sum type operator $\sum$ acts like the existential quantifier. This has been our first step towards encoding all of first-order logic as operations on types in type theory. But don’t be alarmed – when we actually write down propositions later, we will not transcribe them as types every time. Rather, it’s sufficient to know that we could always do that, and that everything we’re doing is well-defined.

As mentioned, we have to tweak our definition of truth a bit, because the members of $\sum_{x : T} ϕ (x)$ actually look like $(x, ⋆)$ and not just $⋆$ . The new definition will just check for inhabitedness.

So,

P is true.

gets encoded as

P is inhabited.

where $P$ is again a type. The members of $P$ are sometimes called the witnesses for the proposition’s truth.^[3]

So, if we encode this first-order-logic statement:

\exists (x : T) . ϕ (x) is true.

in type theory, we get

\sum x : T ϕ (x) is inhabited.

Great!

Universal quantifier

Let’s do the universal quantifier, $\forall$ , next. In order to figure this out, it helps to think about what object would be proof enough that $ϕ (x)$ holds for all elements in $T$ . One such object is a (total) function that maps each $x$ to an element in $ϕ (x)$ , because for a (total) function like that to exist, the types $ϕ (x)$ for all $x : T$ must have at least one element, right? You can’t map a value to an empty type! That is, if there is even one $x : T$ such that $ϕ (x) \equiv 0$ , then no function from $T$ to $ϕ (x)$ can exist. So, the existence of the function is sufficient proof that all $ϕ (x)$ are inhabited.

What is the type of such a witness function? We know that function types are written $\prod_{x : T} B$ , but that assumes that the return type is the same for all $x$ . What we need instead, is a function type where the return type varies with the input value:

f : \prod x : T ϕ (x)

This is called a dependent function type. It’s a bit of a strange type – most programming languages don’t have an equivalent to this, though some very fancy functional programming languages do. It’s usually not that useful, but it is very useful to encode a universal quantifier: $\forall (x : T) . ϕ (x)$ .

One way to think about dependent function types is as a generalization of tuples. For example, take the following tuple type: $A \times B \times C$ . Given an index type that has three elements, $3$ , we can also realize this with a dependent function type. We need a type-valued function $V : \prod_{n : 3} U$ that is defined like this:

V (1_{3}) \equiv A, V (2_{3}) \equiv B, V (3_{3}) \equiv C

With this, an element of $\prod_{n : 3} V (n)$ is equivalent to a 3-tuple.

Another way to see dependent function types is as functions that return a union, except that the type is specified more precisely. We previously discussed the example of the division function on the rational numbers, which returned the special value $⋆ : 1$ for division by zero. But instead of saying that the return type is $Q + 1$ , we can be more precise! We could specify a dependent return type, which is $Q$ for all inputs except where the second argument is 0. So, if

f (0) \equiv 1 and f (y) \equiv Q otherwise

then this is a more precise type for $d i v$ :

d i v : \prod x : Q \prod y : Q f (y) .

However, more realistically, you would probably just restrict the type of the second argument to be $Q_{\neq 0}$ .

A more common use case is “generic” functions. The truth is, we already used a dependent function type once before – in the type of the generic identity function:

i d : \prod T : U (\prod x : T T)

If we define $B (T) :\equiv \prod_{x : T} T$ , then the above type can be written as $\prod_{T : U} B (T)$ . So, this was actually a dependent function all along.

Anyway, we now know that this first-order logic statement:

\forall (x : T) . ϕ (x) is true

can be encoded as

\prod x : T ϕ (x) is inhabited.

Complete first-order logic

Now we only have $\land$ , $\lor$ and $\neg$ left. The first two will be very easy, the last one will be trouble. So, let’s start with the easy ones. Say that types $P$ and $Q$ may or may not be inhabited, what new type could we construct that will be inhabited if and only if both are inhabited? Well, a pair (aka 2-tuple) of course! You can only form a pair out of elements from $P$ and $Q$ if both have elements. So, the logic statement $P \land Q$ will be encoded as $P \times Q$ .

If you take a moment to think about logical or, it should be pretty clear that we can encode $P \lor Q$ as $P + Q$ .

Finally, negation: $\neg$ . If I have a type $P$ that may or may not be inhabited, what new type could we construct that will be inhabited exactly when $P$ is not inhabited and vice versa? The trick that we’ll use for this is that if $P$ is uninhabited, then we can still use it as the domain for the empty function. The empty function maps from $0$ to any type $T$ , and… does just absolutely nothing, as we discussed. So, if $P$ is uninhabited, then $\prod_{p : P} T$ is still inhabited by the empty function for any type $T$ .

However, if $P$ is actually inhabited, then $\prod_{p : P} T$ could still have an element – namely just a normal function that maps $p : P$ to some element of $T$ . So we can’t use $\prod_{p : P} T$ to represent “ $\neg P$ ” because it doesn’t fulfill the requirement that “ $\neg P$ ” be inhabited if and only if $P$ is not inhabited.

But we can fix this by using $0$ as the codomain: $\prod_{p : P} 0$ . If $P \equiv 0$ , this has an element (the empty function), but if there exists $p : P$ , then this can’t have an element because what are you going to map this $p$ to?

So, we encode $\neg P$ as $\prod_{p : P} 0$ . Where is the promised trouble? The trouble is that with this encoding, $\neg \neg P$ does not imply $P$ . To be clear, it works in the other direction: if I have a $p : P$ , then I know that $\prod_{p : P} 0$ is uninhabited, and so this is inhabited:

\prod f : Π_{p : P} 0 0

(by the empty function). This means that $P$ implies $\neg \neg P$ . However, when we try to prove the opposite direction, we run into the problem that we can’t produce an element of $P$ out of nothing – there is just no rule that allows us to construct an element of $P$ out of, essentially, the empty function.

At this point, we must realize the shocking truth, that what we have constructed here is not classical first-order logic, but intuitionistic logic! (Also called constructive logic as it’s used for constructive mathematics.) In intuitionistic logic, proof by contradiction does not work in general and, more pertinently, double-negation elimination is not a thing.

What should have tipped us off from the beginning was that we used two different operators, $\sum$ and $\prod$ , to encode $\exists$ and $\forall$ , because in classical logic, you only need one of them and the other is just the negation of the first. In intuitionistic logic, on the other hand, $\exists$ and $\forall$ are different operators that can’t be defined in terms of each other. For example, in intuitionistic logic, the following does not hold:

\neg (\forall (x : X) . P (x)) \Rightarrow \exists (x : X) . (\neg P (x)) .

The intuitive argument for why this does not hold is that you need to construct a concrete $x$ for which $\neg P (x)$ , but there is no way to do that from just knowing $\neg (\forall (x : X) . P (x))$ .

Intuitionistic logic has its charms, but it would be nice to be able to use classical logic as well. In order to do that, we will simply assume double-negation elimination as an axiom! As it turns out, such an axiom causes problems with the encoding of $\exists$ and $\lor$ as $\sum$ and $+$ , but this is a good sign! Because those are not actually needed to define classical logic. It suffices if we have $\forall$ , $\land$ , and $\neg$ .

It might seem a bit unsatisfactory that we have to assume an additional axiom in order to get classical logic, but on the other hand, it’s remarkable that we got even this far without assuming anything specific to logic. Also, we will later see that double-negation elimination follows from the axiom of choice, which we probably want to assume anyway, so this isn’t so bad.

Which is interesting: in a deep way, it is the axiom of choice which allows us to do classical logic.

Anyway, before we define the axiom, let’s introduce some simplified notation.

Function types revisited

At this point I should come clean and admit that type theorists don’t actually write their function types like this: $\prod_{x : X} Y$ . If the codomain is a constant type, and not a type-valued function, then function types are usually written in this simplified notation:

X \to Y :\equiv \prod x : X Y .

So, a function from $X$ to $Y$ can be written as $f : X \to Y$ . However, if it’s a dependent function, you still have to write its type as $\prod_{x : X} B (x)$ .

(It’s actually interesting that “ $f : X \to Y$ ” with the colon is a common notation even outside of type theory because if you use set theory as your foundation, then I think this should actually be “ $f \in (X \to Y)$ ”, but I guess in set theory you’re not supposed to read “ $X \to Y$ ” as a set with elements, like we do in type theory where $X \to Y$ is just a shorthand for $\prod_{x : X} Y$ .)

With this simplified notation, we can write the negation of $P$ as $P \to 0$ . And the type of the generic identity function is this:

i d : \prod T : U T \to T .

Note that for the first argument (the type parameter) we still need the standard notation, because the other types depend on the value of that first argument.

Functions that have multiple arguments and use currying, are written like this:

a d d : R \to (R \to R) .

However, we usually leave out the parentheses if their scope extends to the end of the expression:

a d d : R \to R \to R .

As the arrow notation may already suggest, we can encode logical implication as a function type. If a function $f : P \to Q$ exists, and $P$ is inhabited by, say, $p : P$ , then we can simply apply the function, $f (p)$ , to get an element of $Q$ and conclude that $Q$ is inhabited as well. Conversely, if we have $f : P \to Q$ , but $P$ is not inhabited, then we can’t conclude anything about $Q$ from this (except that $f$ must be the empty function). So, $P \to Q$ behaves exactly like logical implication does.

Law of excluded middle

Our goal was to recreate classical first-order logic within type theory. The main problem was that double-negation elimination (abbreviated DNE) didn’t hold. The idea was then to assume DNE as an axiom; i.e., to assume that the following type is inhabited:

D N E :\equiv \prod A : U ((A \to 0) \to 0) \to A .

Before we do that, we’ll note that, given all the other constructions we already have, DNE is equivalent to the law of excluded middle (abbreviated LEM):

L E M :\equiv \prod A : U A + (A \to 0) .

LEM says: for any proposition $A$ , we have either $A$ or $\neg A$ .

In order to better get to know DNE and LEM, let’s verify that they are indeed equivalent in our logic system. Going from LEM to DNE is easier, so let’s do that first.

(The proof will showcase how constructive mathematics works in type theory. In order to prove a proposition, we’ll have to explicitly construct an element of the corresponding type! This style of reasoning certainly takes some getting used to.)

So, we are given a function of type LEM, which, for any type $A$ , gives us an element $^a : A + (A \to 0)$ . We want to construct a function of type DNE, that maps the function $g : (A \to 0) \to 0$ to an element of $A$ .

Given $^a$ , an element of a union type, we can do a case analysis. If $^a$ corresponds to an element of $A$ , we’re already done: that’s what we wanted.

If $^a$ corresponds to an element of $A \to 0$ , then we can map this with $g$ to an element of $0$ . Getting an element of $0$ is always exciting, because as I mentioned when introducing the empty type, there is a function $0 \to C$ for any type $C$ , which means there is also one for $A$ ; call it $f : 0 \to A$ . So, if we have an element of $0$ , we can also get an element of $A$ . Of course, (assuming our theory is consistent) there are no elements of $0$ , so (again assuming our theory is consistent) there can be no element of $A \to 0$ which we could then map to $0$ with $g$ , so the element of $A + (A \to 0)$ must have been an element of $A$ all along. But in any case, we have conclusively demonstrated that we can construct an element of $A$ , given $^a : A + (A \to 0)$ and $g : (A \to 0) \to 0$ . Thus, LEM implies DNE.

The fact that we can get anything from an element of $0$ corresponds to the principle of explosion in classical logic.

The opposite direction – from DNE to LEM – is slightly trickier, but it’s also a short proof. First, we’ll have to convince ourselves that, regardless of DNE, for any type $B$ , a function of the following type exists:

f : ((B + (B \to 0)) \to 0) \to 0

So, it’s a function that takes in another function – let’s call it $g$ – and maps that to the empty type, $0$ . $g$ in turn is a function mapping an element from $B + (B \to 0)$ to $0$ . Why do we want this function $f$ ? Because if we had this, then by DNE (double-negation elimination), we would have an element of $B + (B \to 0)$ , which is exactly what LEM says exists. So, let’s explicitly construct $f$ ! We know that it gets $g : B + (B \to 0) \to 0$ as input:

f (g) \equiv ?

So, we will probably want to call $g$ at some point. However, for that we need an element of $B + (B \to 0)$ . We won’t be able to just construct an element of $B$ , but $B \to 0$ seems potentially doable. In fact, can’t we use $g$ itself to construct an element of $B \to 0$ ? Yes, we can!

(λ b . g (l e f t (b))) : B \to 0

As a reminder, $l e f t$ allows us to construct an element of $B + (B \to 0)$ from a $b : B$ . Great, now we’ll just pass this function to $g$ and we’ll have it:

f (g) :\equiv g (r i g h t (λ b . g (l e f t (b)))) : ((B + (B \to 0)) \to 0) \to 0

So, we managed to construct $f$ for an arbitrary type $B$ , which means that the double-negated version of LEM is already always inhabited in our theory. Now we just apply DNE to it and get the regular LEM.

This concludes our first proof in constructive mathematics; I hope it wasn’t too bad!

To construct classical logic we can now just assume either DNE or LEM, right? Unfortunately, there is one last slight complication, which we will discuss in the next post!

To summarize what we have learned: we can use types as truth values, which then allows us to re-use $\prod$ , $\sum$ , $\times$ , … as $\forall$ , $\exists$ , $\land$ , … However, this only gives us intuitionstic logic, so we’ll need to assume an axiom in order to get classical logic.

^
Haskell, for example, writes the single element of the unit type as (). However, the unit type itself is also written as (), so () :: () in Haskell.
^
For the category theorists: if we view types as objects in a category, then this means that the empty type is an initial object and the unit type is a terminal object.
^
Or, more old-fashioned, they’re called the proofs of the proposition, which is why we use the letter $p$ to represent them.