*Epistemic status: Vaguely confused and probably lacking a sufficient technical background to get all the terms right. Is very cool though, so I figured I’d write this.*

And what are these Fluxions? The Velocities of evanescent Increments? And what are these same evanescent Increments? They are neither finite Quantities nor Quantities infinitely small, nor yet nothing. May we not call them the ghosts of departed quantities?

George Berkeley,

The Analyst

When calculus was invented, it didn’t make sense. Newton and Leibniz played fast and dirty with mathematical rigor to develop methods that arrived at the correct answers, but no one knew why. It took another one and a half centuries for Cauchy and Weierstrass develop analysis, and in the meantime people like Berkeley refused to accept the methods utilizing these “ghosts of departed quantities.”

Cauchy’s and Weierstrass’s solution to the crisis of calculus was to define infinitesimals in terms of limits. In other words, to not describe the behavior of functions directly acting on infinitesimals, but rather to frame the the entire endeavour as studying the behaviors of certain operations in the limit, in that weird superposition of being arbitrarily close to something yet not it.

(And here I realize that math is better shown, not told)

The limit of a function at is if for any there exists some such that if

then

Essentially, the limit exists if there’s some value that forces to be within of if is within of . Note that this has to hold true for all , and you choose first!

From this we get the well-known definition of the derivative:

and you can define the integral similarly.

The limit solved calculus’s rigor problem. From the limit the entire field of analysis was invented and placed on solid ground, and this foundation has stood to this day.

Yet, it seems like we lose something important when we replace the idea of the “infinitesimally small” with the “arbitrarily close to.” Could we actually make numbers that were *infinitely small?*

## The Sequence Construction

Imagine some mathematical object that had all the relevant properties of the real numbers (addition, multiplication are associative and commutative, is closed, etc.) but had infinitely small and infinitely large numbers. What does this object look like?

We can take the set of all infinite sequences of real numbers as a starting point. A typical element would be

where is some infinite sequence of real numbers.

We can define addition and multiplication element-wise as:

You can verify that this is a commutative ring, which means that these operations behave nicely. Yet, being a commutative ring is not the same thing as being an ordered field, which is what we eventually want if our desired object is to have the same properties as the reals.

To get from to a field structure, we have to modify it to accommodate well-defined division. The typical way of doing this is looking at how to introduce the zero product property: i.e. ensuring that if then if either one of is .

If we let be the sequence of all zeros in then it is clear that we can have two non-zero elements multiply to get zero. If we have

and

then neither of these are the zero element, yet their product is zero.

How do we fix this? Equivalence classes!

Our problem is that there are too many distinct “zero-like” things in the ring of real numbered sequences. Intuitively, we should expect the sequence to be *basically* zero, and we want to find a good condensation of that allows for this.

In other words, how do we make all the sequences with “almost all” their elements as zero to be equal to zero?

## Almost All Agreement ft. Ultrafilters

Taken from “five ways to say “Almost Always” and actually mean it”:

A

filteron an arbitrary set is a collection of subsets of that is closed under set intersections and supersets. (Note that this means that the smallest filter on is itself).An

ultrafilteris a filter which, for every , contains either or its complement. Aprincipalultrafilter contains a finite set.A

nonprincipal ultrafilterdoes not.This turns out to be an incredibly powerful mathematical tool, and can be used to generalize the concept of “almost all” to esoteric mathematical objects that might not have well-defined or intuitive properties.

Let’s say we define some nonprincipal ultrafilter on the natural numbers. This will contain all cofinite sets, and will exclude all finite sets. Now, let’s take two sequences and define their *agreement set* to be the indices on which are identical (have the same real number in the same position).

Observe that is a set of natural numbers. If then cannot be finite, and it seems pretty obvious that almost all the elements in are the same (they only disagree at a finite number of places after all). Conversely, if this implies that , which means that disagree at almost all positions, so they probably shouldn’t be equal.

Voila! We have a suitable definition of “almost all agreement”: if the agreement set is contained in some arbitrary nonprincipal ultrafilter .

Let be the quotient set of under this equivalence relation (essentially, the set of all distinct equivalence classes of ). Does this satisfy the zero product property?

(Notation note: we will let denote the infinite sequence of the real number , and the equivalence class of the sequence in .)

## Yes, This Behaves Like The Real Numbers

Let such that . Let’s break this down element-wise: either must be zero for all As one of the ultrafilter axioms is that it must contain a set or its complement, either the index set of the zero elements in or the index set of the zero elements in will be in any nonprincipal ultrafilter on Therefore, either or is equivalent to in so satisfies the zero product property.

Therefore, division is well defined on ! Now all we need is an ordering, and luckily almost all agreement saves the day again. We can say for that if almost all elements in are greater than the elements in at the same positions (using the same ultrafilter equivalence).

So, is an ordered field!

## Infinitesimals and Infinitely Large Numbers

We have the following hyperreal:

Recall that we embed the real numbers into the hyperreals by assigning every real number to the equivalence class . Now observe that *is smaller than every real number embedded into the hyperreals this way.*

Pick some arbitrary real number . There exists such that . There are infinitely many fractions of the form , where is a natural number greater than , so is smaller than at almost all positions, so it is smaller than .

This is an infinitesimal! This is a rigorously defined, coherently defined, *infinitesimal number* smaller than all real numbers! In a number system which shares all of the important properties of the real numbers! (except the Archimedean one, as we will shortly see, but that doesn’t really matter).

Consider the following

By a similar argument this is larger than all possible real numbers. I encourage you to try to prove this for yourself!

(The Archimedean principle is that which guarantees that if you have any two real numbers, you can multiply the smaller by some natural number to become greater than the other. This is not true in the hyperreals. Why? (Hint: breaks this if you consider a real number.))

## How does this tie into calculus, exactly?

Well, we have a coherent way of defining infinitesimals!

The short answer is that we can define the *star* operator (also called the *standard part* operator) as that which maps any hyperreal to its closest real counterpart. Then, the definition of a derivative becomes

where is some infinitesimal, and is the natural extension of to the hyperreals. More on this in a future blog post!

It also turns out the hyperreals have a bunch of really cool applications in fields far removed from analysis. Check out my expository paper on the intersection of nonstandard analysis and Ramsey theory for an example!

Yet, the biggest effect I think this will have is pedadogical. I’ve always found the definition of a limit kind of unintuitive, and it was specifically invented to add *post hoc* coherence to calculus after it had been invented and used widely. I suspect that formulating calculus via infinitesimals in introductory calculus classes would go a long way to making it more intuitive.

Different people will have different intuitions. I’ve always found the epsilon-delta method clear and simple, and infinitesimals made of shadows and fog when used as a basis for calculus. Every infinitesimals-first approach I have seen involves unexplained magic or papered-over cracks at some point, unexplained and papered-over because at the stage of first learning calculus the student usually doesn’t know any formal logic. There’s a reason that infinitesimals were only put on a sound footing a century after epsilon-delta. Mathematical logic had to be invented first.

Here the magic lies in depending on the axiom of choice to get a non-principal ultrafilter. And I believe I see a crack in the above definition of the derivative.f is a function on the non-standard reals, but its derivative is defined to only take standard values, so it will be constant in the infinitesimal range around any standard real. If f(x)=x2, then its derivative should surely be 2x everywhere. The above definition only gives you that for standard values of x.

I also think that making it more intuitive is missing the point of learning—really learning—mathematics. The idea of the slope of a curve is already intuitive. What is needed is to show the student a way of thinking about these things that does not depend on the breath of intuition to keep it aloft.

Yep, the definition is wrong. If f:R→R then let ∗f denote the natural extension of this function to the hyperreals (considering ∗R behaves like R this should work in most cases). Then, I think the derivative should be

f′(x)=st(∗f(x+Δx)−∗f(x)Δx)

W.r.t. what the derivative of ∗f should be, I imagine you can describe it similarly in terms of ∗∗R, which by the transfer principle should exist (which applies because of Łoś′s theorem, which I don’t claim to fully understand).

For f(x)=x2, the derivative then is:

f′(x)=st((x+Δx)2−x2Δx),=st(2x+Δx),=2x.

Just in case anyone was wondering why we can’t have any finite sets in the ultrafilter:

If some finite set {n1, n2, …, n_k} is in an ultrafilter U, then either {n1, n2, …, n_(k-1)} is in U or I \ {n1, n2, …, n_(k-1)} is in U. In the latter case, the intersection with the original set is {n_k}, which must be in U. In the former case, you can keep repeating this until you are left with some other one-element set.

If any one-element set {n} is in U, then membership in U is just decided by whether a set contains n or not.

When you go through the equivalence construction, this means that two sequences are equivalent if and only if they agree at the n’th position, which means that all the operations are just the same as arithmetic on that position with the rest not mattering at all. So to get anything different, U really does have to be a

non-principalultrafilter.The bracketed remark doesn’t appear to be true. Why can we not have I={0,2,4,...}∈U or I={1,3,5,...}∈U? Indeed, by the definition of an ultrafilter, we must have one of them in U. Also, in the post, you use I for two different purposes, which makes the post slightly less clear.

Some random thoughts.

First, it would be nice if one could go from rationals to hyperreals directly without having to define the reals in between (especially for people with limit allergies, as the reals are sometimes defined as limits of Cauchy sequences). I don’t see a straightforward way to do so though, you can hardly allow people to encode their reals as sequences of rationals, otherwise the(1/n) sequence would have to be equivalent to zero instead of an infinitesimal.

Also, one could split the hyperreals into equivalence classes within which the Archimedian property holds. Using the big-O adjacent notation, the reals would be Θ(1), and the hyperreal called Ω above would be Θ(n). Stretching the big-O notation, one could call the equivalence class of ϵ something like Θ(1/n). So one has a rather large zoo of these equivalence classes. This would imply that there is no Archimedian equivalence class for the smallest infinite hyperreal. If a hyperreal Θ(f(n)) is infinite (that is, f(n) diverges), then Θ(ln(f(n)) is a smaller infinite hyperreal.

I am well used to there being no biggest infinity, but there being no smallest infinity would indicate that these things are neither equivalent to cardinals nor ordinals.

I found Terry Tao’s writing on the topic to be helpful for understanding, especially the connection between nonprincipal ultrafilters and Arrow’s Impossibility Theorem.

I think hyperreals are too complicated for calculus 1 and you should just talk about a non-rigorous “infinitesimal” like Newton and Leibniz did.

I agree. This is what I was going for in that paragraph. If you define derivatives & integrals with infinitesimals, then you can actually do things like treating dy/dx as a fraction without partaking in the half-in half-out dance that calc 1 teachers currently have to do.

I don’t think the pedagogical benefit of nonstandard analysis is to replace Analysis I courses, but rather to give a rigorous backing to doing algebra with infinitesimals (“an infinitely small thing plus a real number is the same real number, an infinitely small thing times a real number is zero”). *Improper integrals would make a lot more sense this way, IMO.

Thank you, that makes sense!

Why so? I thought they already made sense, they’re “antiderivatives”, so a function such that taking its derivative gives you the original functions. Do you need anything further to define them?

(I know about the definite integral Riemann and Lebesgue definitions, but I thought indefinite integrals were much easier in comparison.

Language mix-up. Meant improper integrals.

Now that I’m thinking about it, my memory’s fuzzy on how you’d actually calculate them rigorously w/infinitesimals. Will get back to you with an example.

Isn’t it easier to just say “If the agreement set I has a nonfinite number of elements”? Why the extra complexity?

Oh I see, so defining it with ultrafilters rules out situations like a=(1,0,1,0,1,0,...) and b=(0,1,0,1,0,1...) where both have infinite zeros and yet their product is zero.

The post is wrong in saying that U contains only cofinite sets. It obviously

mustcontain plenty of sets that are neither finite nor cofinite, because the complements of those sets are also neither finite nor cofinite. Possibly the author intended to type “contains all cofinite sets” instead.In particular, exactly one of

aorbis equivalent to zero in *R.Which one is equivalent to zero depends upon exactly which non-principal ultrafilter you choose, as there are infinitely many non-principal ultrafilters. Unfortunately (as with many other applications of the Axiom of Choice) there is no finite way to specify which ultrafilter you mean.

Yep, this is correct! I’ve updated the post to reflect this.

E.g. if an ultrafilter contains the set of all even naturals, it won’t contain the set of all odd naturals, neither of which are finite or cofinite.

Thanks, this is helpful to point out.

Of course, this makes all of this rather abstract. It looks to me like for almost any two hyperreals (e.g. a, b as above), the answer to “which of them is larger?” is “It depends on the ultrafilter. Also, I can not tell you if a set is part of any specific ultrafilter. But fear not, for any given ultrafilter, the hyperreals are well-ordered.”

Basically for any usable theorem, one would have to prove that the result is independent of the actual ultrafilter used, which means that numbers such as a and b will probably not feature in them a lot.

I can not fault my analysis 1 professor for opting to stick to the reals (abstract as they are already are) instead.

I don’t understand some of the words you used, so please correct me if I am wrong. What are the equivalents of the original natural numbers here? Is it like 2 = { (2, 2, 2...), and all sequences that contain an infinite number of 2′s and a finite number of anything else } ?

Then we would have a

partiallyordered set, because 2 is neither greater than nor smaller than { (1, 3, 1, 3, 1, 3...), and its equivalents }. Is that okay?Yes. We have 2=[(2,2,2,...)]. But we can compare 2 with (1,3,1,3,1,3,...) since (1,3,1,3,1,3,1,3,...)=1 (this happens when the set of all even natural numbers is in your ultrafilter) or (1,3,1,3,1,3,1,3,...)=3 (this happens when the set of all odd natural numbers is in your ultrafilter). Your partially ordered set is actually a linear ordering because whenever we have two sequences (an)n,(bn)n, one of the sets

{n:an>bn},{n:an<bn},{n:an=bn} is in your ultrafilter (you can think of an ultrafilter as a thing that selects one block out of every partition of the natural numbers into finitely many pieces), and if your ultrafilter contains

{n:an>bn}, then [(an)n]>[(bn)n].

Thank you for this. it looks like a good first contact with hyperreals.

Two nitpicks:

Ω=(1,2,3,ldots). --> I think you forgot a “\” here and it is messing your formatting up.

It is not clear in the post why we use a hyperfilter, rather than just the set of all infinite sets.

Furthermore after

the slash is used for the setminus operation. I think using \setminus there (which generates a backslash) would be a more standard notation less likely to be mistaken for quotient structures.

I’m familiar with \setminus being used to denote set complements, so \not\in seemed more appropriate to me (I is not an element of U). I interpret I∖U as “the elements of I not in U,” which is the empty set in this case? (also the elements of U are sets of naturals while the elements of I are naturals, so it’s unclear to me how much this makes sense)

Sorry, I was quoting the only parts of the sentence.

What I meant was that I would change

to

I have heard of filters and ultrafilters, but I have never heard of anyone calling any sort of filter a hyperfilter. Perhaps it is because the ultrafilters are used to make fields of hyperreal numbers, so we can blame this on the terminology. Similarly, the uniform spaces where the hyperspace is complete are called supercomplete instead of hypercomplete.

But the reason why we need to use a filter instead of a collection of sets is that we need to obtain an equivalence relation.

Suppose that I is an index set and Xi is a set with |Xi|>2 for i∈I. Then let M be a collection of subsets of I. Define a relation R on ∏i∈IXi by setting ((xi)i∈I,(yi)i∈I)∈R if and only if {i∈I:xi=yi}∈M. Then in order for R to be an equivalence relation, R must be reflexive, symmetric, and transitive. Observe that R is always symmetric, and R is reflexive precisely when I∈M.

Proposition: The relation R is transitive if and only if M is a filter.

Proof:

← Suppose that M is a filter. Then whenever ((xi)i∈I,(yi)i∈I),((yi)i∈I,(zi)i∈I)∈R, we have

{i∈I:xi=yi},{i∈I:yi=zi}∈M, so since

{i∈I:xi=zi}⊇{i∈I:xi=yi}∩{i∈I:yi=zi}, we conclude that {i∈I:xi=zi} as well. Therefore, ((xi)i∈I,(zi)i∈I)∈R.

→. Suppose now that R,S∈M. Then let let y=0,x=χRc,z=2⋅χSc where χ denotes the characteristic function. Then [x=y]=R,[y=z]=S and [x=z]=R∩S. Therefore,(x,y),(y,z)∈R, so by transitivity, (x,z)∈R as well, hence R∩S=[x=z]∈M.

Suppose now that R⊆S and R∈M. Let x=0,y=χRc and set z=2⋅χSc.

Observe that [x=y]=R and [y=z]=R. Therefore, (x,y),(y,z)∈R. Thus, by transitivity, we know that (x,z)∈R. Therefore, S=[x=z]∈M. We conclude that M is closed under taking supersets. Therefore, M is a filter.

Q.E.D.

Oops, my bad. I re-read the post as I was typing to make sure I hadn’t missed any explanation. That can sometimes cause me to type what I read instead of what I intended. I probably interverted the prefixes because they feel similar.

Thank you for the math. I am not sure everything is right with your notations in the second half, it seems to me there must be a typo either for the intersection case or the superset one. But the ideas are clear enough to let me complete the proof.

The definition of a derivative seems wrong. For example, suppose that f(x)=0 for rational x but f(x)=1 for irrational x. Then f is not differentiable anywhere, but according to your definition it would have a derivative of 0 everywhere (since Δx could be an infinitesimal consisting of a sequence of only rational numbers).

Have updated the definition of the derivative to specify the differences between f over the hyperreals and f over the reals.

I think the natural way to extend your f to the hyperreals is for it to take values in an infinitesimal neighborhood surrounding rationals to 0 and all other values to 1. Using this, the derivative is in fact undefined, as st(0/Δx)=0/st(Δx)=0/0.

First, I don’t think it’s a good idea to have to rely on the axiom of choice in order to be able to define continuity.

Now, from my point of view, saying that continuity is defined in terms of limits is the wrong way to look at it. Continuity is a property relative to the topology of your space. If you define continuity in terms of open sets, I find that not only the definition does make sense, but also it extends in general to any topological space. But I kind of understand that not everyone will find this intuitive.

Also, I believe that your definitions that replace the limits in terms of hyperreals have to take into account all possible infinitesimals, and thus I don’t understand how it’s really any different that the sequential characterization of limits. But maybe I’m missing something.

Let \(X,Y\) be topological spaces. Then a function \(f:X\rightarrow Y\) is continuous if and only if whenever \((x_d)_{d\in D}\) is a net that converges to the point \(x\), the net \((f(x_d))_{d\in D}\) also converges to the point \(f(x)\). This is not very hard to prove. This means that we do not have to discuss as to whether continuity should be defined in terms of open sets instead of limits because both notions apply to all topological spaces. If anything, one should define continuity in terms of closed sets instead of open sets since closed generalize slightly better to objects known as closure systems (which are like topological spaces, but we do not require the union of two closed sets to be closed). For example, the collection of all subgroups of a group is a closure system, but the complements of the subgroups of a group have little importance, so if we want the definition that makes sense in the most general context, closed sets behave better than open sets. And as a bonus, the definition of continuity works well when we are taking the inverse image of closed sets and when we are taking the closure of the image of a set.

With that being said, the good thing about continuity is that it has enough characterizations so that at least one of these characterizations is satisfying (and general topology texts should give all of these characterizations even in the context of closure systems so that the reader can obtain such satisfaction with the characterization of his or her choosing).

Uhm, hyperreals really look like packaged limits, I don’t expect understanding them is easier than understanding limits.