Foreword

It’s been too long—a month and a half since my last review, and about three months since Analysis I. I’ve been immersed in my work for CHAI, but reality doesn’t grade on a curve, and I want more mathematical firepower.

On the other hand, I’ve been cooking up something really special, so watch this space!

Analysis II

12: Metric Spaces

Metric spaces; completeness and compactness.

Proving Completeness

It sucks, and I hate it.

13: Continuous Functions on Metric Spaces

Generalized continuity, and how it interacts with the considerations introduced in the previous chapter. Also, a terrible introduction to topology.

There’s a lot I wanted to say here about topology, but I don’t think my understanding is good enough to break things down—I’ll have to read an actual book on the subject.

14: Uniform Convergence

Pointwise and uniform convergence, the Weierstrass $M$ -test, and uniform approximation by polynomials.

Breaking Point

Suppose we have some sequence of functions $f^{(n)} : [0, 1] \to R$ , $f^{(n)} (x) := x^{n}$ , which converge pointwise to the 1-indicator function $f : [0, 1] \to R$ (i.e., $f (1) = 1$ and $0$ otherwise). Clearly, each $f^{(n)}$ is (infinitely) differentiable; however, the limiting function $f$ isn’t differentiable at all! Basically, pointwise convergence isn’t at all strong enough to stop the limit from “snapping” the continuity of its constituent functions.

Progress

As in previous posts, I mark my progression by sharing a result derived without outside help.

Already proven: $\int_{- 1}^{1} (1 - x^{2})^{N} d x \geq \frac{1}{\sqrt{N}}$ .

Definition. Let $ϵ > 0$ and $0 < δ < 1$ . A function $f : R \to R$ is said to be an $(ϵ, δ)$ -approximation to the identity if it obeys the following three properties:

$f$ is compactly supported on $[- 1, 1]$ .
$f$ is continuous, and $\int_{- \infty}^{\infty} f = 1$ .
$| f (x) | \leq ϵ$ for all $δ \leq | x | \leq 1$ .

Lemma: For every $ϵ > 0$ and $0 < δ < 1$ , there exists an $(ϵ, δ)$ -approximation to the identity which is a polynomial $P$ on $[- 1, 1]$ .

Proof of Exercise 14.8.2(c). Suppose $c \in R, N \in N$ ; define $f (x) := c (1 - x^{2})^{N}$ for $x \in [- 1, 1]$ and $0$ otherwise. Clearly, $f$ is compactly supported on $[- 1, 1]$ and is continuous. We want to find $c, N$ such that the second and third properties are satisfied. Since $(1 - x^{2})^{N}$ is non-negative on $[- 1, 1]$ , $c$ must be positive, as $f$ must integrate to $1$ . Therefore, $f$ is non-negative.

We want to show that $| c (1 - x^{2})^{N} | \leq ϵ$ for all $δ \leq | x | \leq 1$ . Since $f$ is non-negative, we may simplify to $(1 - x^{2})^{N} \leq \frac{ϵ}{c}$ . Since the left-hand side is strictly monotone increasing on $[- 1, - δ]$ and strictly monotone decreasing on $[δ, 1]$ , we substitute $x = δ$ without loss of generality. As $ϵ > 0$ , so we may take the reciprocal and multiply by $ϵ$ , arriving at $ϵ (1 - δ^{2})^{- N} \geq c$ .

We want $\int_{- \infty}^{\infty} f = 1$ ; as $f$ is compactly supported on $[- 1, 1]$ , this is equivalent to $\int_{- 1}^{1} f (x) d x = 1$ . Using basic properties of the Riemann integral, we have $\int_{- 1}^{1} (1 - x^{2})^{N} d x = \frac{1}{c}$ . Substituting in for $c$ ,

\begin{matrix} ϵ^{- 1} (1 - δ^{2})^{N} & \leq \frac{1}{\sqrt{N}} \leq \int_{- 1}^{1} (1 - x^{2})^{N} d x, \end{matrix}

with the second inequality already having been proven earlier. Note that although the first inequality is not always true, we can make it so: since $ϵ$ is fixed and $1 - δ^{2} \in (0, 1)$ , the left-hand side approaches $0$ more quickly than $\frac{1}{\sqrt{N}}$ does. Therefore, we can make $N$ as large as necessary; isolating $ϵ$ ,

ϵ \geq (1 - δ^{2})^{N} \sqrt{N} ϵ \geq \sqrt{N} > (1 - δ^{2})^{N} \sqrt{N},

the second line being a consequence of $1 > (1 - δ^{2})^{N}$ . Then set $N$ to be any natural number such that this inequality is satisfied. Finally, we set $c = \frac{1}{\int_{- 1}^{1} (1 - x^{2})^{N} d x}$ . By construction, these values of $c, N$ satisfy the second and third properties. □

Convoluted No Longer

Those looking for an excellent explanation of convolutions, look no further!

Weierstrass Approximation Theorem

Theorem. Suppose $f : [a, b] \to R$ is continuous and compactly supported on $[a, b]$ . Then for every $ϵ > 0$ , there exists a polynomial $P$ such that $| | P - f | |_{\infty} < ϵ$ .

In other words, any continuous, real-valued $f$ on a finite interval can be approximated with arbitrary precision by polynomials.

Why I’m talking about this. On one hand, this result makes sense, especially after taking machine learning and seeing how polynomials can be contorted into basically whatever shape you want.

On the other hand, I find this theorem intensely beautiful. $¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯ P [a, b] = C [a, b]$ ’s proof was slowly constructed, much to the reader’s benefit. I remember the very moment the proof sketch came to me, newly-installed gears whirring happily.

15: Power Series

Real analytic functions, Abel’s theorem, $exp$ and $log$ , complex numbers, and trigonometric functions.

$EXP$

Cached thought from my CS undergrad: exponential functions always end up growing more quickly than polynomials, no matter the degree. Now, I finally have the gears to see why:

exp (x) := \infty \sum k = 0 \frac{x^{k}}{k!} .

$exp$ has all the degrees, so no polynomial (of necessarily finite degree) could ever hope to compete! This also suggests why $\frac{d}{d x} e^{x} = e^{x}$ .

Complex Exponentiation

You can multiply a number by itself some number of times.

[nods]

You can multiply a number by itself a negative number of times.

[Sure.]

You can multiply a number by itself an irrational number of times.

[OK, I understand limits.]

You can multiply a number by itself an imaginary number of times.

[Out. Now.]

Seriously, this one’s weird (rather, it seems weird, but how can “how the world is” be “weird”)?

Suppose we have some $c \in C$ , where $c = a + b i$ . Then $e^{c} = e^{a} e^{b i}$ , so “all” we need to figure out is how to take an imaginary exponent. Brian Slesinsky has us covered.

Years before becoming involved with the rationalist community, Nate asks this question, and Qiaochu answers.

This isn’t a coincidence, because nothing is ever a coincidence.

Or maybe it is a coincidence, because Qiaochu answered every question on StackExchange.

16: Fourier Series

Periodic functions, trigonometric polynomials, periodic convolutions, and the Fourier theorem.

17: Several Variable Differential Calculus

A beautiful unification of Linear Algebra and calculus: linear maps as derivatives of multivariate functions, partial and directional derivatives, Clairaut’s theorem, contractions and fixed points, and the inverse and implicit function theorems.

Implicit Function Theorem

If you have a set of points in $R^{n}$ , when do you know if it’s secretly a function $g : R^{n - 1} \to R$ ? For functions $R \to R$ , we can just use the geometric “vertical line test” to figure this out, but that’s a bit harder when you only have an algebraic definition. Also, sometimes we can implicitly define a function locally by restricting its domain (even if no explicit form exists for the whole set).

Theorem. Let $E$ be an open subset of $R^{n}$ , let $f : E \to R$ be continuously differentiable, and let $y = (y_{1}, \dots, y_{n})$ be a point in $E$ such that $f (y) = 0$ and $\frac{\partial f}{\partial x_{n}} \neq 0$ . Then there exists an open $U \subseteq R^{n - 1}$ containing $(y_{1}, \dots, y_{n - 1})$ , an open $V \subseteq E$ containing $y$ , and a function $g : U \to R$ such that $g (y_{1}, \dots, y_{n - 1}) = y_{n}$ , and

{(x_{1}, \dots, x_{n}) \in V : f (x_{1}, \dots, x_{n}) = 0} = {(x_{1}, \dots, x_{n - 1}, g (x_{1}, \dots, x_{n - 1})) : (x_{1}, \dots, x_{n - 1}) \in U} .

So, I think what’s really going on here is that we’re using the derivative at this known zero to locally linearize the manifold we’re operating on (similar to Newton’s approximation), which lets us have some neighborhood $U$ in which we can derive an implicit function, even if we can’t always write it out.

18: Lebesgue Measure

Outer measure; measurable sets and functions.

Tao lists desiderata for an ideal measure before deriving it. Imagine that.

19: Lebesgue Integration

Building up the Lebesgue integral, culminating with Fubini’s theorem.

Conceptual Rotation

Suppose $Ω \subseteq R^{n}$ is measurable, and let $f : Ω \to [0, \infty]$ be a measurable, non-negative function. The Lebesgue integral of $f$ is then defined as

\int_{Ω} f := sup {\int_{Ω} s : s is simple and non-negative, and minorizes f} .

This hews closely to how we defined the lower Riemann integral in Chapter 11; however, we don’t need the equivalent of the upper Riemann integral for the Lebesgue integral.

To see why, let’s review why Riemann integrability demands the equality of the lower and upper Riemann integrals of a function $g$ . Suppose that we integrate over $[0, 1]$ , and that $g$ is the indicator function for the rationals. As the rationals are dense in the reals, any interval $[a, b] \subseteq [0, 1]$ ( $b > a$ ) contains rational numbers, no matter how much the interval shrinks! Therefore, the upper Riemann integral equals 1, while the lower equals 0 (for similar reasons). $g$ is Lebesgue integrable; since it’s 0 almost everywhere (as the rationals have 0 measure), its integral is 0.

This marks a fundamental shift in how we integrate. With the Riemann integral, we consider the $lim sup$ and $lim inf$ of increasingly-refined upper and lower Riemann sums—this is the length approach. In Lebesgue integration, however, we consider which $E \subseteq Ω$ is responsible for each value $y$ in the range (i.e., $f^{- 1} (y) = E$ ), multiplying $y$ by the measure of $E$ - this is inversion.

In a sense, the Lebesgue integral more cleanly strikes at the heart of what it means to integrate. Surely, Riemann integration was not far from the mark; however, if you rotate the problem slightly in your mind, you will find a better, cleaner way of structuring your thinking.

Final Thoughts

Although Tao botches a few exercises and the section on topology, I’m a big fan of Analysis I and II. Do note, however, that II is far more difficult than I (not just in content, but in terms of the exercises). He generally provides relevant, appropriately-difficult problems, and is quite adept at helping the reader develop rigorous and intuitive understanding of the material.

Forwards

Next is Jaynes’ Probability Theory.

Tips

To avoid getting hung up in Chapter 17, this book should be read after a linear algebra text.
Don’t do exercise 17.6.3 - it’s wrong.
Deep understanding comes from sweating it out. Don’t hide, don’t wave away bothersome details—stay and explore. If you follow my strategy of quickly generating outlines—can you formally and precisely write out each step?

Verification

I completed every exercise in this book; in the second half, I started avoiding looking at the hints provided by problems until I’d already thought for a few minutes. Often, I’d solve the problem and then turn to the hint: “be careful when doing X—don’t forget edge case Y; hint: use lemma Z”! A pit would form in my stomach as I prepared to locate my mistake and back-propagate where-I-should-have-looked, before realizing that I’d already taken care of that edge case using that lemma.

Why Bother?

One can argue that my time would be better spent picking up things as I work on problems in alignment. However, while I’ve made, uh, quite a bit of progress with impact measures this way, concept-shaped holes are impossible to notice. If there’s some helpful information-theoretic way of viewing a problem that I’d only realize if I had already taken information theory, I’m out of luck.

Also, developing mathematical maturity brings with it a more rigorous thought process.

Fairness

There’s a sense I get where even though I’ve made immense progress over the past few months, it still might not be enough. The standard isn’t “am I doing impressive things for my reference class?”, but rather the stricter “am I good enough to solve serious problems that might not get solved in time otherwise?”. This is quite the standard, and even given my textbook and research progress (including the upcoming posts), I don’t think I meet it.

In a way, this excites me. I welcome any advice for buckling down further and becoming yet stronger.

If you are interested in working with me or others on the task of learning MIRI-relevant math, if you have a burning desire to knock the alignment problem down a peg—I would be more than happy to work with you. Messaging me may also net you an invitation to the MIRIx Discord server.

On a related note: thank you to everyone who has helped me; in particular, TheMajor has been incredibly generous with their explanations and encouragement.

Turning Up the Heat: Insights from Tao’s ‘Analysis II’