Thinking Mathematically—Convergent Sequences
This is a prototype attempt to create lessons aimed at teaching mathematical thinking to interested teenagers. The aim is to show them the nuts and bolts of how mathematics is built, rather than to teach them specific facts or how to solve specific problems. If successful I might write more.
Calculate
What you’ll quickly notice (or might already be aware of) as you keep on adding smaller and smaller steps is that it gets closer and closer to 2 but never quite gets there.
Now you might be tempted to say that
When we see an equation like a = b, that implies certain things:
E.g.
E.g. If , then .
We’ve defined what it means for an equation to have a finite number of terms on one of the sides. We’ve never defined what it means for an equation to have an infinite number of terms. Do all those rules we have about equations apply to these infinite term equations too?
If and , does 1 + ?
What about if we multiply the two together? Does ?
What does it even mean to multiply two infinite sums together?
To answer all these questions we need to define everything in terms of things we already understand. Then we can try and prove properties about them.
The first step is to break our infinite terms into something that only has finite terms. To do that we stop talking about vague things like infinite sums, and start talking about sequences.
A sequence is just a neverending list of numbers. 1,2,3,4… is a sequence, as is 1,4,9,16...
More formally it’s a mapping from the positive integers to the reals[1]. If you tell me you have a sequence , I need to be able to ask you what the 172nd element of is, and you need to be able to answer me. We denote this as .
In our case, we can define a sequence:
In other words, for any integer , the th number in the sequence is the sum of the first terms in . Now because we’re only ever dealing with a finite number of terms, every member of this sequence is well defined.
Now we want to define a property called convergence, and say that converges to 2. How might we define that?
Our first attempt might be to say that:
A sequence converges to if for all , is always closer to than .
But a bit of exploration proves that is totally insufficient—By that definition converges to 3, 17, and in fact any number greater than 2.
So let’s try and address that in our next attempt:
A sequence converges to if for all , is always closer to than , and eventually gets infinitely close to .
But we can’t just willy nilly throw around terms like infinitely close. We need to define convergence only in terms of things we already understand. What we want to express is that if you pick any value, the sequence eventually gets closer to A than that value. We can express that precisely as:
A sequence converges to if for all , is always closer to than , and for any non-zero number , there is a number , such that [2].
That works for all the examples we’ve given so far. But it doesn’t work for some other examples. What about:
We definitely want to be able to express somehow that the value of this sequence is 0, but it doesn’t fit our definition, since it doesn’t continuously get closer to 0 it keeps on overshooting, but the amount it overshoots by keeps getting smaller.
So we need to loosen our criteria. Instead of requiring it to always get closer to A, we can say that for any number, the sequence needs to eventually get closer to A than that number, and stay there:
A sequence converges to if for any non-zero number , there is an integer , such that for all integers , where , .
And this is indeed the most common definition of convergence that mathematicians use.
With this tool we can now start to explore concrete questions about convergence.
For example lets define the sum of two series as follows:
if .
Now if converges to , and converges to , does necessarily converge to ?
We can attempt to sketch out a proof:
For any nonzero number , there is a value such that for all , and a value such that for all , . Without loss of generality[3], assume that . Then for all , we have .
Not only does this prove that adding sequences works as we expect, it also hints that when we do so the resultant sequence converges nearly as fast as the slowest of the two constituent sequences. Convergence speeds are a relatively advanced topic, but you can bet that the first thing you’ll do if you study it is try to define precisely what it means for a sequence to converge quickly or slowly.
Now the usual notation for converges to is:
However you’ll sometimes see mathematicians skipping this notation. Instead of writing they’ll just write: .
What’s going on here?
Firstly Mathematicians are lazy, and since everyone who reads the second form will understand it means the same thing as the first, why bother writing it out in full?
But it’s actually hinting at something deeper—when you work with limits of sums they often behave as though you literally are just adding an infinite number of terms. Manipulations like the one Euler used for the Basel problem often happen to work even though they aren’t actually justified by our definitions, and this notation can give you the hint you need to attempt just such an “illegal” manipulation before you commit yourself to finding a full blown formal proof.
Later mathematicians then discovered some of the conditions under which you can treat a limit as a normal sum, and so such loose notation can actually provide fertile ground for future mathematical discoveries. This isn’t an isolated event—such formalisation of informal notation has been repeated across many disparate branches of mathematics.
- ^
It doesn’t have to be the reals—you could have a sequence of functions or shapes or whatever, but for our purposes it’s the reals.
- ^
Those vertical lines mean the absolute value, which means to ignore whether the value is positive or negative. So . Here it’s just used to express that and are less than distance apart, without specifying whether is greater than or less than .
- ^
Without loss of generality is another way of saying that we’re going to prove one scenario (here where ), but the proof is identical in other scenarios (e.g. when ) if you just switch the symbols around (so call and vice versa), so we don’t want to repeat the proof multiple times for each scenario.
This full definition of a limit is quite technical, and has many logical layers that are hard to understand for someone inexperienced in the field:
you start with a real number A and an infinite sequence of real numbers s1,s2,…, that’s indexed by a natural number n.
In order for the convergence to hold, you need a certain property to hold for all real numbers ε, after further conditioning on ε>0.
The specific condition that needs to hold is that, depending on this ε (as well as depending on the earlier variables - sn and A), there exists a natural number k that satisfies a condition.
The condition that this number k satisfies is that for all natural numbers n≥k, the inequality |sn−A|<ε is satisfied.
Each bullet point relies on the previous one, so you either understand all points at once or none at all.
There are 5 different variables here, and each one plays an important and distinct role - A is the limit, s is the sequence, n is an index to the sequence, ε is a “sensitivity” parameter to measure closeness to the limit and k is a “largeness” parameter to measure how large your index must be for the sequence to be close enough to the limit.
Two of these variables are given from the start, while three of them have an existential or universal quantifier. The order of the quantifiers is critical—first a universal one, then an existential one, then again a universal one. Each variable depends on all the previous ones in the definition.
Also, these 5 variables cover 3 different “data types”: two are real numbers, two are natural numbers and one is a function-type (mapping natural numbers to real numbers). The student also has to understand and remember which of the data types appear in each of the 3 quantified variables (this is critical because the definition of a limit for real-valued functions switches up the datatypes for the k,n variables).
There are also 3 required inequalities - ε>0,n≥k,|sn−A|<ε. Each one plays an important and different role. The student has to understand and remember which type of inequality appears in each part, out of the set of “reasonable” relations: {<,≤,>,≥,=,≠}. Also, the definitions of the second and third inequalities may change to n>k,|sn−A|≤ε and the definition still works, but the first inequality can’t change to ε≥0 without completely ruining the definition.
All in all, I like intuitive approaches to mathematics and I don’t think this subject is inherently inaccessible, I just think that the limit definition should have a lot more motivation—each variable, quantifier and inequality should become “obvious” and the student should be able to reconstruct it from first principles.
https://www.math.ucla.edu/%7Etao/resource/general/131ah.1.03w/
I like Terry Tao’s approach here, with intermediate definitions of “ε-close” and “eventually ε-close” in order to make the final definition less cluttered.
I wish mathematicians would take a page out of computer science/software engineering, where we’ve collectively decided that single-character variable names are bad practice.
I do understand the value and beauty of a terse notation, especially when hand-writing it, but I can also appreciate similar beauty of well-structured and self-documenting code, especially within an code editor that uses a language server that can provide context for any symbol. White space hints at structure and comments clarify the more difficult to parse sections of code
I’m constantly algebraicly manipulating symbols. We generally call it refactoring, but it’s the same thing up to an isomorphism. I aim to write my code in such a way to minimize the cognitive load on the reader. Using single-character symbols adds a whole layer of cognitive load where the reader needs to keep a mental map of what each symbol represents, especially when the convention chooses an arbitrary symbol, rather than at least using abbreviations. This feels especially onerous for students who are trying to learn the concepts behind the symbols, while trying to keep track of what each symbol represents
This is really is just a general rant. You did a good job with your explanation. It balances technical nuance with approachability. That really is why I even had this thought to begin with.
Thanks for sharing!
This will make sense when mathematicians stop using pen and paper. Or maybe only for presenting the final equation. Otherwise solving an equation would take ten pages full of prose—and the terse notation was historically invented to prevent exactly this.
For an introduction to young audiences, I think it’s better to get the point across in less technical terms before trying to formalize it. The OP jumps to epsilon pretty quickly. I would try to get to a description like “A sequence converges to a limit L if its terms are ‘eventually’ arbitrarily close to L. That is, no matter how small a (nonzero) tolerance you pick, there is a point in the sequence where all of the remaining terms are within that tolerance.” Then you can formalize the tolerance, epsilon, and the point in the sequence, k, that depends on epsilon.
Note that this doesn’t depend on the sequence being indexed by integers or the limit being a real number. More generally, given a directed set (S, ≤), a topological space X, and a function f: S → X, a point x in X is the limit of f if for any neighborhood U of x, there exists t in S where s ≥ t implies f(s) in U. That is, for every neighborhood U of x, f is “eventually” in U.
I agree. I think real analysis should really be taking a more topological approach to limits and continuity. In a topology classroom, they would instead define a limit in the real numbers as “every open ball around your limit point contains all of the elements of the sequence past a certain index”, which is much the same as your description of Terry Tao’s “ϵ-close” and “eventually ϵ-close”. Likewise, a continuous function would be defined, “For every open ball around f(x) in the range, there is an open ball around x in the domain where points around the domain ball get mapped inside the range’s ball.” The whole—ϵδdefinition obscures what is really going on with a bunch of mathematical jargon.