The Law of Identity

Summary: When we define the range of possible values for a variable X, we are fixing an ontology, that is, a way of carving up the space of values. The Law of Identity asserts that this ontology respects a given equivalence function.

Wikipedia defines the Law of Identity as follows: “In logic, the law of identity states that each thing is identical with itself”. It is often written as X=X.

While this law seems straightforward, it is anything but once we start digging into what it actually means. The challenge is that it’s very difficult to say what this law means without stating a tautology.

Take, for example, the definition above. What does it mean for a thing (let’s say A, to be concrete) to “be identical with itself”?

Well, in order for this to make sense we need to have a model where A is not identical to itself which we can reject. If we don’t have such a model to reject, then this statement will be tautological.

We can represent this using set theory as follows:

  1. Let be a set containing and (ie. ). Here A is the “thing” and we’re assigning it two separate sub-things that “are it” so that we can talk about them being equal to each other or not. We can think of A as corresponding to a congruence relation in set theory.

  2. When checking if is identical to itself, we’ll be checking if . If there were more than two variables, then we would do a pairwise comparison of elements. If instead of numbers, they are something like formulas applied to a specific value, then we’ll have to evaluate them before making the comparison.

We can now consider some concrete examples (apologies if the examples are repetitive):

  • Let’s suppose represents a variable that we have in memory. We copy it such that we have another variable in memory and if the copy operation happened successfully it should also be 100. However, we can also imagine an unreliable copy operation which often produces a value of 99 or 101. In this context, being identical to itself can serve as a shorthand for the copy operation being reliable such that we can treat all copies as a single variable. If operations are regularly unreliable, then we will tend to assign each copy its own variable.

  • Let and suppose we have two functions and . Similar to how we made , let’s define . Suppose f represents different attempts to apply the operation with and being two separate attempts to apply it to the same value of x. If our calculation is reliable, then should equal , however if correctly calculates 100 and incorrectly calculates 1,000, similar to how a human might mess up, then they will differ. So here, being identical to itself can serve as a shorthand for the operation providing consistent answers. If operations are regularly unreliable, then we will tend to assign each run its own variable.

  • Let represent different observations of the number of bananas in front of me and represent the number of bananas in two specific cases. If I look once and see two bananas, then look a second time and see an extra banana has appeared, then I’ll need two separate variables to represent the number of bananas. But if we’ve limited the scope of our consideration such that there are only ever three bananas in front of me, I can use any member of B. Another situation where I might need two separate variables would be if there were only ever three bananas, but I messed up the count. Here, B being identical to itself represents that multiple observations should return the same answer. If the number of bananas was changing, or my observations of how many bananas were in front of me weren’t consistent, then I’d need two separate variables.

  • We can imagine a similar situation as the last point, but instead of making observations of the same aspect of the world which we believe to be constant, we could be accessing the same value in memory. This is similar to the first point, but we’re considering multiple accesses of a variable stored in the same location, rather than different locations.

I could keep going and listing different scenarios, but as we can see the Law of Identity is actually pretty complicated underneath and can represent quite different things in different scenarios.

In each scenario, we had a variable that could potentially be further sub-divided (ie. by indexing on the copy, computation, observation or retrieval). We discussed the conditions when it would make sense to work with the course-grained variable (represented by the set) and when it would make sense to work with the fine-grained variables (represented by individual variables). In this article, we considered numerical equivalence, but it works with other kinds of equivalence as well.

Consequences

I think understanding the Law of Identity is a pretty good starting point for trying to understand the nature of logic and mathematics. After all, it’s pretty much the simplest law out there. And if you don’t have a clear explanation of this law, then that might be a hint that you’re not ready to tackle the deeper questions yet.

I guess that if we took this understanding of the Law of Identity and try to extrapolate that out to a theory of logic, the natural way to do this would be to produce a non-Cartesian view of logic where logic describes an abstraction for thinking about the agent’s interactions with the world and/​or the agent’s understanding of its own cognition.

Let me know if you think I should write something about some of the other basic axioms of logic, but to be honest, I’m not really planning to do so at the moment, as I think extending this kind of reasoning to those axioms should be relatively straightforward.

Addendum:

Justis suggested adding the following example to further clarify when the Principle of Identity doesn’t hold: “Suppose you’re playing a game and a certain enemy always drops one of three things, but the precise thing varies randomly. It could drop a gold, silver, or bronze coin, for example. Then enemyDrop() != enemyDrop(), so identity in some sense doesn’t hold on invocation.”

On the other hand, you may have enemyDrop(seed) = enemyDrop(seed). So again, the law only holds when your variable is sufficiently fine-grained.