Notes on Fourier Analysis
This post records what I’ve learned while studying a bit of Fourier analysis. I used this PDF, which is the lecture notes for this Stanford course. The only thing in here that is really changed from there is the derivation of the Fourier transform, where I tried to explain the way I made sense of it. (That explanation may or may not make sense.)
Fourier Series
Fourier analysis starts with the study of periodic functions. The fundamental periodic function is the complex exponential
Of course, most of our functions are real-valued, not complex. Fourier himself developed his theory of Fourier series and transforms using sines and cosines. But we have since found that the equations come out much neater if you write them in terms of complex exponentials. So we will write sines and cosines in terms of the complex exponential:
And we’ll take Fourier series and Fourier transforms of complex-valued real-argument functions,
We said we would talk about periodic functions. A function
First of all, the complex exponentials that have period
Second, we need the linear combinations of those exponentials, since they’re also periodic with period
Since there are countably infinitely many base exponentials, we can also make infinite series with them, and these are also periodic:
And we’re done!
It turns out that all reasonable[3] periodic functions with a given period
The set of complex-valued functions is a vector space, since we can add them and multiply them by complex numbers:
Since the functions are periodic, we can integrate over any segment of length
I claimed that every periodic function can be written as a Fourier series. This is the same as saying that the complex exponentials are a basis[5] for our vector space. It turns out, the complex exponentials are an orthogonal basis:
So, if we have a function
(If you’re not comfortable working with dot products of functions, you should expand those into the integral definition and see that this works, I’m just omitting it here for brevity.)
So if we know
We can also write the coefficients
The periodic function of time
When a function is real-valued, the coefficients
A common exercise to get the feel of Fourier series is finding the coefficients for a function that is a repeated polynomial, like one of these:
The four horsemen of Fourier series exercises or something. I am not good at image editing.
You can do these if you want to practice. We’ll do one example, which looks kind of like the top left one, a square wave. It will be useful to us next section. Our function will be
in the range
for
These coefficients turn out to be real, and so the elements all become cosines. This always happens when the function is even (like the cosines themselves); when it’s not, some of the coefficients will be complex. The constant term,
You can play with a graph of this series here, changing how many terms of the series are added and the period
The Fourier Transform
Fourier series are very useful for analyzing periodic functions. But what if your function is not periodic? Can we analyze non-periodic functions in terms of complex exponentials? Well, easy, a non-periodic function is just a periodic function with period
In our
So, what happens to the Fourier coefficients as we increase
you’ll see that if
Let’s do another experiment. Since our
Now each coefficient, which we can name
But both
The even coefficients are the same as the coefficients we got before! This makes sense, since they are multiplying the exponentials with the same frequencies.
What about the odd coefficients,
Now, for the point of all this: we’ve been treating
Something like this.
But why would we want to do it like that? Well, because we can: we conveniently have these new frequencies lying around that have opposite signs in the two different halves of the integral. The frequencies
Well, that’s my intuitive non-rigorous deduction, at least. Does it actually work? If you go back to the Desmos demonstration, you will see a hidden function definition in line 6; click on the circle besides it to show it. It looks like this:
The function being below the red one and having a slope in the 0 portion keeps happening when you increase N, but if you increase the period T these mismatches go down.
This a sum of cosines with frequencies between the ones you’re summing in the normal series, weighed by the average between the Fourier coefficients of the frequencies above and below it. And as you can see, it is close to the normal sum at
So the function that is like a
Even Fourier coefficients
with values close to ;and odd Fourier coefficients
with values in between and ;
so that the even coefficients add up to half the red function above, the odd ones add to half the purple function above, and they all together add up to their average: a function that is like
That’s what happens as we increase
It’s like we’re “spreading out” the coefficients around
This distribution will be a function of the frequency
We’re adding
This function
Using the Fourier Transform
Now our equations are these:
Very nice and symmetrical. If
Let’s go back to our square wave function! With an infinite period there is only one positive pulse, and it becomes the rect function:
The Fourier coefficients of our periodic
If we divide by
a function called
Just as expected. Unfortunately, the reverse integral
This is what sinc looks like.
Some useful properties of the Fourier transform, all of which are easy to prove from the definition:
It’s linear:
It’s unitary, meaning it preserves inner products:
(Now that we are working with all functions the inner product is )
In particular this means the norm of a function is the same as the norm of its Fourier transform:The value of the Fourier transform at 0 is the integral of the original function:
. This is analogous to how the Fourier coefficient of index 0 was the average value of the function.The nice symmetry in the equations has some useful consequences: if we define the reverse signal
, then and . Also, . This is called the “duality” of the Fourier transform. It means most of the results we find about the Fourier transform have a very similar analogue for the reverse Fourier transform. For instance, it tells us the Fourier transform of is , which is itself since it’s even.If
is real-valued, then like with the Fourier coefficients, . This means if a function is even (like and ) its Fourier transform will be real-valued, and if it’s odd its transform’s values will be purely imaginary.Squeezing a function in time by a factor of
will stretch out its Fourier transform by a corresponding amount:
You might remember this principle from when it showed up in the Quantum Physics Sequence.
Shifting a function in time will add a rotating complex exponential phase to its Fourier transform: if
, then . I find this one easier to understand in its dual form: if then . If we push all the frequencies forward by , we are making each of the exponentials, and so the entire function, rotate faster at an additional frequency of .The Fourier transform of the derivative of a function is
; the other way around is .The convolution of two functions is
. The Fourier transform of a convolution is the product of the Fourier transforms: . It’s the same the other way, .
As another example, here’s the Fourier transform of a Gaussian distribution with variance
It’s another Gaussian! You can see how stretching out the function (increasing the variance) will squeeze its Fourier transform. Also, there’s no constant before the Fourier transform because then
A More General Definition
Now, it seems like the Fourier transform might be pretty useful, but our definition only works if the integral
The reason this happens can be seen if we try to apply the same convergence process we did to get
So, can we find an actual formal definition of the delta function, which lets us do Fourier transforms with it? A guy named Laurent Schwartz did it, and here’s how.
First, we define a set of “extremely nice functions”. The condition to be in this set is that the function is infinitely differentiable and it and all of its derivatives go to zero at infinity faster than any polynomial grows. This is the Schwartz space
, or the set of rapidly decreasing functions. The formal way of stating the definition is that a function is in if and only if it is in and for all ,
It turns out that all members of this class have Fourier transforms, and inverse Fourier transforms, and the (inverse or normal) Fourier transform of a member of the class is always also on the class. Of course,
and are not in the class ( is discontinuous and does not go to 0), though they do have Fourier transforms. Actually, the way in which and are not Schwartz is an instance of a more general principle: is discontinuous so its transform does not go to 0 faster than grows, a continuous but non-differentiable function would have a transform that doesn’t go to 0 faster than , and so on. Gaussian distributions are Schwartz, though.Then we define the set of tempered distributions as the dual space of
, the set of linear maps , that take a Schwartz function and return a complex number. These are our “generalized functions”.The distribution that will represent a function
will be defined as , for the functions where that integral converges for all . (This works for most functions because of how fast the Schwartz functions go down, but not something like .) It’s kind of like probability distributions, they’re defined by their “amount in each region of the real line”, for a “region” defined by a which is nearly zero everywhere but that region. (Though of course they don’t have to be normalized, or even have a finite integral over the entire real line.)We define the Dirac delta functions as
. (That integral is an abuse of notation, doesn’t have a “value at a point”, but taking and returning is a linear mapping.)We can define lots of things for distributions by analogy:
The product of a distribution with a (Schwartz) function:
so we defineThe convolution of a distribution with a function:
(which you can prove by swapping the integrals) so we define .The time-reversed distribution:
so we define .The derivative of a tempered distribution: integration by parts tells us that
, so we define . This means every distribution, even the ones that come from discontinuous functions, have a derivative; for example the derivative of is, as you might have expected, .Finally, the Fourier transform: you can show that
so we define . Same with the inverse Fourier transform. This also means that every distribution has a Fourier transform!
Almost all of the Fourier transform’s properties still hold analogously for distributions: linearity, shifting, stretching, convolution to multiplication (with Schwartz functions), derivatives and duality.
So, now can we find the Fourier transform of a complex exponential? Well, we take the function
This is just the definition of the inverse Fourier transform of
We can also show, by duality or manually, that
It’s useful to know what happens when you combine
If you multiply a function by
you get a multiple of : .If you take the convolution of a function with
, it shifts the function by : . In particular, ; is the identity of convolution! This makes sense since convolution becomes multiplication after a Fourier transform, and the identity of multiplication, 1, is the Fourier transform of .
Ш and the Sampling Theorem
In real life, we are not given signals as continuous functions defined at every point in the real line. We only get discrete samples of the signal sampled at periodic intervals. We need to then interpolate those samples into a full signal. But how do we know that our interpolation is correct, that corresponds to the original signal? We use the Nyquist-Shannon Sampling Theorem, which we will show in this section.
First, though, we will need to define a very useful distribution, called the Dirac comb or the Sha function, and represented by the Cyrillic letter Sha (Ш):
It is a series of
If you multiply it by a function
That’s another series of spikes at every
If you convolve it with
This is called the periodization of
The final thing we need to know is what the Fourier transform of
Now we are ready to work on the Sampling Theorem. Here’s how we’ll apply
This means that if
Since
Let’s do the math. We need to do the convolution of
So, if we write the sample points as
If a signal has no frequencies greater than
So we just have to… take infinitely many samples… and combine every one of them. Still doesn’t really look like a practical thing you could do with a real signal. We want to be able to only take finitely many samples and still use the Sampling theorem; for that to work, we would like
So we need to restrict ourselves only to the functions that are both time-limited and band-limited. There is only one problem: there aren’t any. Well, technically the zero function satisfies both properties, but there are no other functions that are both band-limited and time-limited. This is another manifestation of the principle where making a function concentrated in the time domain will make its Fourier transform spread out in the frequency domain: if the function is restricted to only one region of the time, its spectrum has to be spread out to infinity.
We deal with this problem by ignoring it completely. We just presume our functions are “approximately band-limited” and “approximately time-limited”, and assume the theorem holds approximately. I get the sense that in practice, we mostly use the numerical result that says that to capture frequencies up to
What if there are in fact frequencies in the data that are over half the sampling frequency? Then they show up looking like they were lower frequencies, in a phenomenon known as aliasing.
Have you ever looked at a spinning fan where it seemed like you could see the blades, rotating at a speed much lower than they should be? That’s aliasing. Your eye doesn’t sample an image often enough, so from one “sample” to the next, you look near where a blade previously was, see what is probably a different blade (that has rotated many full revolutions and ended up near where the first blade was originally), and interpret this as the original blade having moved there. [EDIT: it seems like maybe the human eye doesn’t actually work that way, and this effect is due to flickering of an electrical light. See cousin_it’s comment.]
If you try to interpolate a signal while not having enough samples, the same problem occurs. This can be solved using an anti-aliasing filter, which cuts off too-high frequencies from the signal before sampling, so that they can’t cause problems when reconstructing it after.
The Discrete Fourier Transform
Now armed with the Sampling Theorem, we can come up with a discrete version of the Fourier transform. We will suppose our signal
The sampling theorem tells us that we need to sample the signal at a rate of at least
To get the discretized
We will sample this discretized transform at regular intervals, with a step size of
Now we can substitute
The discrete Fourier transform, like the continuous version, is linear, so it can be written as matrix multiplication. It’s easiest to write out the relevant matrix if we first define
defining
Now we can understand why it was fine to take frequencies in any period of length
This is because frequency
Now, for the fundamental examples of the DFT: we will define the discrete delta and the discrete exponential as:
The
As you might have expected from the names, we have
To obtain discrete duality results, we will need is a discrete time-reversal: we will define
Applying the second of those, we can find the inverse DFT:
We can also figure out the matrix form of the inverse DFT, since its columns will be it times the
We will end by talking about the very important Fast Fourier Transform (FFT) algorithm for computing discrete Fourier transforms. The basic idea is divide-and-conquer: we compute the DFT of a vector of length
As we saw, the DFT of a vector
We want to write this in terms of
And if we take
And these sums are elements of the Fourier transforms!
Just one other thing: since
However, we can see that
So, if
And the FFT algorithm is just taking a length
Receive as input a vector
, of length for .If
is length 1, return . (Base case.)Create two vectors
and , with the even and odd indices of respectively. In Python notation: , . (This is O(n).)Run FFT for both of them, getting the vectors
and . (Recursive step.)Make a new vector
with . (Also O(n).)Finally, return the concatenation of the vectors
. (Also O(n).)
It’s that simple! And a Python implementation can be basically as short as that. This algorithm runs in O(n log n). The only problem is that it only works for vectors with a length that is a power of two; for other lengths you have to pad your vector with zeroes or use a fancier algorithm.
- ^
We’re using
for convenience. - ^
If
we have the constant function , which is periodic of every period. - ^
All the ones that are integrable and have a finite squared integral
. - ^
Actually we’re working only with the “reasonable” functions we defined in the previous footnote, which make up a subspace of that subspace.
- ^
Well, actually a Schauder basis.
- ^
That is the constant function 1, or the distribution
. - ^
Multiplied by a constant factor of
which we can ignore. - ^
We will also treat the signal itself as periodic (
), because the same thing shows up in the reverse DFT. - ^
A matrix is unitary if its conjugate transpose is equal to its inverse.
I don’t think this is right. The effect can be seen in video recordings due to shutter speed, and under some kinds of electric lights which flicker at high frequency. But if you look with your own eyes in daylight, a spinning fan will just look blurry. Human eyes have no shutter and don’t take point samples.
The same goes for sound. With sampling, a signal above the Nyquist frequency will “wrap around” and seem like it’s lower frequency. But with human ears, no. A sound above the cutoff frequency of your ears simply won’t be heard.
Here’s another fun discrepancy between Fourier/Nyquist/etc and what our senses actually do. Imagine a sawtooth wave at 1Hz. As a trigonometric series, it’s a sum of many harmonics. But of course your ears won’t hear that sound as a uniform hum of many notes at once. They’ll hear it as a periodic click once per second.
Due to things like this, it takes a little bit of practical intuition to apply Fourier/Nyquist/etc to signal processing intended for humans.
Huh, that’s weird, because I have in fact experienced what I described many times while looking at fans (physically, not through video)! The blades would seem to be spinning slowly, and sometimes reverse direction. I hadn’t ever even heard of aliasing, it was just mysterious to me.
Was it indoors or outdoors? Like I said, electric lights often have a flicker frequency (50 or 60 Hz depending on country, or some multiple of that).
The same thing can happen with musical instruments btw. When I play guitar indoors, sometimes I can see the string “wobble”, much slower than the sound frequency ought to be. I’m pretty sure that’s because of the interplay between string frequency and light flicker. In sunlight it doesn’t happen.
It was almost always indoors, so that might have been it. I just did an experiment: I turned on a fan I have in my room, while no electric lights were on (only sunlight through the window). Looking at the blades, they indeed seemed like a blur that didn’t resolve into individual blades. However, when I changed the speed of the fan, it seemed like a part of the blur resolved into a bunch of blades that slowed down their spin and changed directions. The blades didn’t fully resolve into visibility, and there were more blurry kind of visible blades than there should have been blades in the fan, but it was very clearly happening (multiple times as I changed the speed back and forth, though not every time I changed the speed). Weird.
Yeah, I can’t check right now, but my guess is some electrical effect when changing fan speed.