This post records what I’ve learned while studying a bit of Fourier analysis. I used this PDF, which is the lecture notes for this Stanford course. The only thing in here that is really changed from there is the derivation of the Fourier transform, where I tried to explain the way I made sense of it. (That explanation may or may not make sense.)

Fourier Series

Fourier analysis starts with the study of periodic functions. The fundamental periodic function is the complex exponential , which comes up as the solution of the harmonic oscillator equation and many other places. In the complex plane, this function starts on the horizontal axis at , then spins counterclockwise through the unit circle at an angular velocity of . (Or clockwise at that speed if is negative.)

Of course, most of our functions are real-valued, not complex. Fourier himself developed his theory of Fourier series and transforms using sines and cosines. But we have since found that the equations come out much neater if you write them in terms of complex exponentials. So we will write sines and cosines in terms of the complex exponential:

And we’ll take Fourier series and Fourier transforms of complex-valued real-argument functions, .

We said we would talk about periodic functions. A function is periodic of period if for all ; that is, after a delay of it repeats itself. If we fix , what are all the functions with period ?

First of all, the complex exponentials that have period are for integers ;^[1] in a time they move radians, going around the unit circle times, clockwise or counterclockwise depending on the sign of .^[2]

Second, we need the linear combinations of those exponentials, since they’re also periodic with period . That’s how we get sine and cosine! We also get some funkier-looking functions, like :

Since there are countably infinitely many base exponentials, we can also make infinite series with them, and these are also periodic:

And we’re done!

It turns out that all reasonable^[3] periodic functions with a given period can be defined as a series of complex exponentials. The coefficients are the Fourier series of . To find their values, we will need to do some linear algebra.

The set of complex-valued functions is a vector space, since we can add them and multiply them by complex numbers: and . The set of periodic complex-valued functions of period is a subspace of it, since a linear combination of periodic functions is periodic.^[4] We can also define an inner product in that space:

Since the functions are periodic, we can integrate over any segment of length and get the same result. Also, some people define the inner product by taking the conjugate the first function, but we’re using the same convention as our book.

I claimed that every periodic function can be written as a Fourier series. This is the same as saying that the complex exponentials are a basis^[5] for our vector space. It turns out, the complex exponentials are an orthogonal basis:

So, if we have a function that can be written as a series of complex exponentials, and take the dot product of the function and one of the exponentials, we get:

(If you’re not comfortable working with dot products of functions, you should expand those into the integral definition and see that this works, I’m just omitting it here for brevity.)

So if we know , we can calculate the Fourier coefficients:

We can also write the coefficients as , which makes our results look cooler:

The periodic function of time contains the same information as our new function of the integers . That function is the spectrum of ; it shows how is a combination of the basic harmonics, the complex exponentials.

When a function is real-valued, the coefficients and will be complex conjugates of each other, . You can add those terms together and turn them into a term and a term. So Fourier series for real functions can be equivalently written in terms of sines and cosines or complex exponentials.

A common exercise to get the feel of Fourier series is finding the coefficients for a function that is a repeated polynomial, like one of these:

You can do these if you want to practice. We’ll do one example, which looks kind of like the top left one, a square wave. It will be useful to us next section. Our function will be

in the range , and then repeated with period , which we’ll leave as a parameter. That is, is 1 in some equally spaced intervals and 0 outside of them. (We assume .) I didn’t define the function at , but that doesn’t really matter; you can pick your favorite of 0, ¹⁄₂ or 1. Let’s do the coefficients:

for ; for the second integral is trivially 1 so the coefficient is . So, the series is

These coefficients turn out to be real, and so the elements all become cosines. This always happens when the function is even (like the cosines themselves); when it’s not, some of the coefficients will be complex. The constant term, , is (for all functions) the average value of the function, its integral divided by the period.

You can play with a graph of this series here, changing how many terms of the series are added and the period . Here’s what it looks like for , adding the first 20 terms:

The Fourier Transform

Fourier series are very useful for analyzing periodic functions. But what if your function is not periodic? Can we analyze non-periodic functions in terms of complex exponentials? Well, easy, a non-periodic function is just a periodic function with period ! We will need some more adaptations, but that’s the basic idea.

In our function we have a handy parameter that changes how far apart the pulses are. It seems reasonable to expect that if we let go to infinity, all pulses other than the one at will be infinitely far away from the origin, and we will get the rect function:

So, what happens to the Fourier coefficients as we increase ? If you increase in the Desmos demonstration, you will find that the approximations get worse for the same value of (the amount of terms added). If you look back to the formula for the coefficients,

you’ll see that if is large enough, the coefficients go to 0. Why does this happen?

Let’s do another experiment. Since our function has period , obviously it will also be a periodic function of period , with two square “pulses” per period. What happens if we try to calculate the Fourier coefficients of while plugging in as the period?

Now each coefficient, which we can name , will be multiplied by the exponential . Suppose is even: (now it’s more convenient to integrate from 0 to )

But both and are periodic with period , so both of the integrals have the same value!

The even coefficients are the same as the coefficients we got before! This makes sense, since they are multiplying the exponentials with the same frequencies.

What about the odd coefficients, ? Well, the even coefficients already add up to , so the odd coefficients must be 0. When you calculate them, for the first half of the integral, you integrate one cycle of times , the frequency between and , and get a result that I would expect will probably be somewhere in between the values of the integral for these two frequencies. But then, at , the exponential has spun around the unit circle and a half times, and is at −1. So the second half of the integral has the same value, multiplied by −1; these add and cancel out into 0!

Now, for the point of all this: we’ve been treating with period as if its period was . But what we really want is to double in the definition of , so that the square pulses are twice as far apart as they previously were. This new function is the same as taking the with period and averaging it with a function that has a pulse every , but alternates between positive and negative pulses.

But why would we want to do it like that? Well, because we can: we conveniently have these new frequencies lying around that have opposite signs in the two different halves of the integral. The frequencies and will look a lot like each other near , and like the opposite of each other near . So if we add up the odd frequencies, each multiplied by values in between the coefficients of the close-by even frequency, the result will be close to near and close to near ; exactly like the picture!

Well, that’s my intuitive non-rigorous deduction, at least. Does it actually work? If you go back to the Desmos demonstration, you will see a hidden function definition in line 6; click on the circle besides it to show it. It looks like this:

This a sum of cosines with frequencies between the ones you’re summing in the normal series, weighed by the average between the Fourier coefficients of the frequencies above and below it. And as you can see, it is close to the normal sum at and to the negative of it at , like we expected! The fit becomes even better as you increase , since the frequencies become closer and closer.

So the function that is like a with period will have:

Even Fourier coefficients with values close to ;
and odd Fourier coefficients with values in between and ;

so that the even coefficients add up to half the red function above, the odd ones add to half the purple function above, and they all together add up to their average: a function that is like but with one pulse every !

That’s what happens as we increase , in general: the coefficients for each frequency decrease roughly like , and the new frequencies get coefficients “in between” those, so the new and old frequencies “cancel out” where there would previously have been repetitions.

It’s like we’re “spreading out” the coefficients around , to and as they appear, then those spread out to the ones between them and so on. To simplify things and take the limit as goes to infinity, we will turn our spectrum, the amounts of each frequency in , into a “frequency density distribution”: something like a probability density distribution or a charge density distribution in physics.

This distribution will be a function of the frequency ,which we’ll call ; we’ll call the distribution . It will measure the “amount of coefficient per unit frequency”: instead of adding the different values of , we’ll integrate . So this is how our equation for how is made of exponentials changes:

We’re adding rather than the raw , the accounting for the decrease in the coefficients as we increase and get more and more granular frequencies. This decrease is proportional to , so to get the actual values of , rather than its integral, we need to divide by in the formula for the Fourier series coefficients:

This function is known as the Fourier transform of .

Using the Fourier Transform

Now our equations are these:

Very nice and symmetrical. If is a function of time, is a function of frequency; we say is a signal and is its spectrum. The Fourier transform can also written as or . (We will use the former but not the latter.) We also refer to as the inverse Fourier transform of , and write .

Let’s go back to our square wave function! With an infinite period there is only one positive pulse, and it becomes the rect function:

The Fourier coefficients of our periodic function were this:

If we divide by and let go to infinity as goes to , we get:

a function called . Let’s see if this matches our formula for the Fourier transform of :

Just as expected. Unfortunately, the reverse integral is tricky so we won’t be able to verify that side.

Some useful properties of the Fourier transform, all of which are easy to prove from the definition:

It’s linear:
It’s unitary, meaning it preserves inner products:
(Now that we are working with all functions the inner product is )
In particular this means the norm of a function is the same as the norm of its Fourier transform:
The value of the Fourier transform at 0 is the integral of the original function: . This is analogous to how the Fourier coefficient of index 0 was the average value of the function.
The nice symmetry in the equations has some useful consequences: if we define the reverse signal , then and . Also, . This is called the “duality” of the Fourier transform. It means most of the results we find about the Fourier transform have a very similar analogue for the reverse Fourier transform. For instance, it tells us the Fourier transform of is , which is itself since it’s even.
If is real-valued, then like with the Fourier coefficients, . This means if a function is even (like and ) its Fourier transform will be real-valued, and if it’s odd its transform’s values will be purely imaginary.
Squeezing a function in time by a factor of will stretch out its Fourier transform by a corresponding amount:

You might remember this principle from when it showed up in the Quantum Physics Sequence.

Shifting a function in time will add a rotating complex exponential phase to its Fourier transform: if , then . I find this one easier to understand in its dual form: if then . If we push all the frequencies forward by , we are making each of the exponentials, and so the entire function, rotate faster at an additional frequency of .
The Fourier transform of the derivative of a function is ; the other way around is .
The convolution of two functions is . The Fourier transform of a convolution is the product of the Fourier transforms: . It’s the same the other way, .

As another example, here’s the Fourier transform of a Gaussian distribution with variance :

It’s another Gaussian! You can see how stretching out the function (increasing the variance) will squeeze its Fourier transform. Also, there’s no constant before the Fourier transform because then will be 1, which is the integral of the original function. So if we take the variance to be , the resulting function will be its own Fourier transform!

A More General Definition

Now, it seems like the Fourier transform might be pretty useful, but our definition only works if the integral converges. And it only does if goes down to 0 sufficiently quickly at infinity. For example, take the complex exponential itself, ; the integral is , which oscillates indefinitely for , and goes to infinity for . What good is our way of decomposing functions into a distribution of complex exponentials if it can’t even decompose the complex exponentials themselves?

The reason this happens can be seen if we try to apply the same convergence process we did to get . When we increase the period we use for acomplex exponential, the Fourier coefficients don’t “spread out”; they keep being 1 at that complex exponential itself and 0 everywhere else. So the distribution will go to infinity at the exponential’s frequency , and to 0 elsewhere: a Dirac’s delta function . This does kind of match the behavior of the integral, but it’s not a rigorous derivation.

So, can we find an actual formal definition of the delta function, which lets us do Fourier transforms with it? A guy named Laurent Schwartz did it, and here’s how.

First, we define a set of “extremely nice functions”. The condition to be in this set is that the function is infinitely differentiable and it and all of its derivatives go to zero at infinity faster than any polynomial grows. This is the Schwartz space , or the set of rapidly decreasing functions. The formal way of stating the definition is that a function is in if and only if it is in and for all ,

It turns out that all members of this class have Fourier transforms, and inverse Fourier transforms, and the (inverse or normal) Fourier transform of a member of the class is always also on the class. Of course, and are not in the class ( is discontinuous and does not go to 0), though they do have Fourier transforms. Actually, the way in which and are not Schwartz is an instance of a more general principle: is discontinuous so its transform does not go to 0 faster than grows, a continuous but non-differentiable function would have a transform that doesn’t go to 0 faster than , and so on. Gaussian distributions are Schwartz, though.
Then we define the set of tempered distributions as the dual space of , the set of linear maps , that take a Schwartz function and return a complex number. These are our “generalized functions”.
The distribution that will represent a function will be defined as , for the functions where that integral converges for all . (This works for most functions because of how fast the Schwartz functions go down, but not something like .) It’s kind of like probability distributions, they’re defined by their “amount in each region of the real line”, for a “region” defined by a which is nearly zero everywhere but that region. (Though of course they don’t have to be normalized, or even have a finite integral over the entire real line.)
We define the Dirac delta functions as . (That integral is an abuse of notation, doesn’t have a “value at a point”, but taking and returning is a linear mapping.)
We can define lots of things for distributions by analogy:
- The product of a distribution with a (Schwartz) function: so we define
- The convolution of a distribution with a function: (which you can prove by swapping the integrals) so we define .
- The time-reversed distribution: so we define .
- The derivative of a tempered distribution: integration by parts tells us that , so we define . This means every distribution, even the ones that come from discontinuous functions, have a derivative; for example the derivative of is, as you might have expected, .
- Finally, the Fourier transform: you can show that so we define . Same with the inverse Fourier transform. This also means that every distribution has a Fourier transform!
Almost all of the Fourier transform’s properties still hold analogously for distributions: linearity, shifting, stretching, convolution to multiplication (with Schwartz functions), derivatives and duality.

So, now can we find the Fourier transform of a complex exponential? Well, we take the function , define it as a distribution , and the Fourier transform is:

This is just the definition of the inverse Fourier transform of at , which is . So , which means , just as we expected. So, because of sine and cosine’s representation as combinations of complex exponentials, we have

We can also show, by duality or manually, that . In particular, .^[6] Just like squeezing a function will stretch out its Fourier transform, the most squeezed possible function () will have the most stretched out possible Fourier transform (1).

It’s useful to know what happens when you combine with a function:

If you multiply a function by you get a multiple of : .
If you take the convolution of a function with , it shifts the function by : . In particular, ; is the identity of convolution! This makes sense since convolution becomes multiplication after a Fourier transform, and the identity of multiplication, 1, is the Fourier transform of .

Ш and the Sampling Theorem

In real life, we are not given signals as continuous functions defined at every point in the real line. We only get discrete samples of the signal sampled at periodic intervals. We need to then interpolate those samples into a full signal. But how do we know that our interpolation is correct, that corresponds to the original signal? We use the Nyquist-Shannon Sampling Theorem, which we will show in this section.

First, though, we will need to define a very useful distribution, called the Dirac comb or the Sha function, and represented by the Cyrillic letter Sha (Ш):

Ш

It is a series of spikes, at 0 and at every , where is a parameter. $Ш$ is then periodic with period (defined for distributions by analogy, as always). $Ш$ by itself isn’t that useful, what is useful is when you combine it with a function.

If you multiply it by a function you get this:

$Ш$

That’s another series of spikes at every , but the spikes’ weights are the values of at those points. This means the product only contains some of the information in , specifically its values at every ; this can be seen as a sampling of at a rate of one sample every seconds.

If you convolve it with instead, you get:

$Ш$

This is called the periodization of , because it has period for any function . For each point, you take all the other points displaced from it by a multiple of , the points where the value of the function should be equal for it to be periodic, and add them all up, and that’s the new value. If is 0 outside of one segment of length , this is the same as taking that segment and repeating it every ; for example, $Ш$ is our original period- function!

The final thing we need to know is what the Fourier transform of $Ш$ is. The derivation is a bit long and complicated so I won’t do it here, if you’re curious you can go read the book, but it turns out that $Ш Ш$ , the Sha function with period 1 is its own Fourier transform! In general, $Ш Ш$ .

Now we are ready to work on the Sampling Theorem. Here’s how we’ll apply $Ш$ : We suppose we have samples of at every interval of time , so our knowledge is the product $Ш$ . The Fourier transform of that product will be the convolution of with a different Sha function $Ш$ , which means it’s a periodization of with period .^[7]

This means that if happens to be 0 outside of a region of length , that periodization will be just repeated again and again, and we can take one of these regions and use that to know the value of , and invert to get ! So, to do this we will suppose is band-limited, meaning its spectrum contains only frequencies below a certain cutoff value ; that is, we are restricted to functions where for . For example, is band-limited with limit , since its spectrum is .

Since is non-zero from to , a range of length , our condition for this to work is that . This number, , which we’ll call , is a frequency known as the sampling rate.

Let’s do the math. We need to do the convolution of with $Ш$ , and take the values of to , and 0 otherwise. We will introduce the stretched rect function, a generalization of our . That will be a function that is 1 in an interval of length around the origin, and 0 outside of it. We can write this as . So our band-limiting condition is that , and from it we deduced that $Ш$ . Finally, take the inverse Fourier transform of both sides, which will get on one side, and a function of the samples of on the other:

$Ш Ш$

$Ш$

So, if we write the sample points as , we have found this equation:

If a signal has no frequencies greater than , and we sample at a rate greater than every , we can rebuild the original signal from the samples, by combining shifted s weighed by the samples! This is the Sampling Theorem we were looking for.

So we just have to… take infinitely many samples… and combine every one of them. Still doesn’t really look like a practical thing you could do with a real signal. We want to be able to only take finitely many samples and still use the Sampling theorem; for that to work, we would like to be zero when is large enough, so that we only need to add some of the sum’s terms. We’d like to be time-limited, that is, for there to be a time such that for . This is like being band-limited, but in the time domain instead of in the frequency domain.

So we need to restrict ourselves only to the functions that are both time-limited and band-limited. There is only one problem: there aren’t any. Well, technically the zero function satisfies both properties, but there are no other functions that are both band-limited and time-limited. This is another manifestation of the principle where making a function concentrated in the time domain will make its Fourier transform spread out in the frequency domain: if the function is restricted to only one region of the time, its spectrum has to be spread out to infinity.

We deal with this problem by ignoring it completely. We just presume our functions are “approximately band-limited” and “approximately time-limited”, and assume the theorem holds approximately. I get the sense that in practice, we mostly use the numerical result that says that to capture frequencies up to in a signal, the sampling rate has to be above . Sincs are kind of unwieldy to use anyway since they themselves spread out to infinity. Anyway, the result is still foundational and, in my opinion, really cool.

What if there are in fact frequencies in the data that are over half the sampling frequency? Then they show up looking like they were lower frequencies, in a phenomenon known as aliasing.

Have you ever looked at a spinning fan where it seemed like you could see the blades, rotating at a speed much lower than they should be? That’s aliasing. Your eye doesn’t sample an image often enough, so from one “sample” to the next, you look near where a blade previously was, see what is probably a different blade (that has rotated many full revolutions and ended up near where the first blade was originally), and interpret this as the original blade having moved there. [EDIT: it seems like maybe the human eye doesn’t actually work that way, and this effect is due to flickering of an electrical light. See cousin_it’s comment.]

If you try to interpolate a signal while not having enough samples, the same problem occurs. This can be solved using an anti-aliasing filter, which cuts off too-high frequencies from the signal before sampling, so that they can’t cause problems when reconstructing it after.

The Discrete Fourier Transform

Now armed with the Sampling Theorem, we can come up with a discrete version of the Fourier transform. We will suppose our signal is basically zero outside of an interval of length (say, 0 to ), and that its Fourier transform is also basically zero outside of an interval of length (say, 0 to ). It might seem strange to not take frequencies from an interval around zero (like ) but as we’ll see, these will be equivalent because frequencies like and will look the same for aliasing reasons.

The sampling theorem tells us that we need to sample the signal at a rate of at least , with . Since our total time is , this means we can take samples. We’ll take them at times . The discretized will be a vector with the values of these samples, .

To get the discretized , we will take the Fourier transform of a “discrete distribution” version of , which has spikes at the sample points with weights :

We will sample this discretized transform at regular intervals, with a step size of (by the sampling theorem applied backwards), so the discretized Fourier transform will also have elements, at the points . So the discrete Fourier transform of the vector will have the values:

Now we can substitute , and , and finally , and get the discrete Fourier transform (DFT):

The discrete Fourier transform, like the continuous version, is linear, so it can be written as matrix multiplication. It’s easiest to write out the relevant matrix if we first define , the fundamental th-order root of unity. Then we have:

defining as a matrix.

Now we can understand why it was fine to take frequencies in any period of length : if we add to one of the frequency indices we get , which multiplies each term in the sum by , which is 1, so . The spectrum is periodic!^[8]

This is because frequency turns radians around the unit circle for each sampling step . This means frequency number , for example, will at each step make a complete round trip around the unit circle, plus one additional turn of radians, and so will arrive at the same place as the first frequency. The two of them are indistinguishable; that’s where aliasing comes from.

Now, for the fundamental examples of the DFT: we will define the discrete delta and the discrete exponential as:

$𝛅 𝛚$

The $𝛅$ are just the basis vectors; $𝛚$ is equal to the k-th column or the k-th row of the matrix . $𝛚$ is the vector of ones . (We’re using componentwise exponentiation for the $𝛚$ , starting from $𝛚$ .)

As you might have expected from the names, we have $𝛅 𝛚$ . In particular, $𝛅$ . However, unlike what you might have expected, $𝛚$ is not equal to $𝛅$ , but instead to $𝛅$ . This happens because, as it turns out, the complex exponentials are orthogonal but not orthonormal, since they have have norm :

$𝛚 𝛚$

To obtain discrete duality results, we will need is a discrete time-reversal: we will define . Since we treat signals as periodic of period , we can also write . It’s easy to show that $𝛅 𝛅$ ; also, compellingly, $𝛚 𝛚$ . We can use these facts to obtain the duality results:

Applying the second of those, we can find the inverse DFT:

We can also figure out the matrix form of the inverse DFT, since its columns will be it times the $𝛅$ :

$𝛅 𝛚 𝛚$

the conjugate transpose of times ! is almost unitary.^[9]

We will end by talking about the very important Fast Fourier Transform (FFT) algorithm for computing discrete Fourier transforms. The basic idea is divide-and-conquer: we compute the DFT of a vector of length by doing some operations, then computing the DFT of two smaller vectors of length .

As we saw, the DFT of a vector of length is

We want to write this in terms of , to get length- transforms. , so if is even, we write , so that , and if is odd, we write ), so that . So we separate the sum above into two:

And if we take to be the vector of the even indices of , and to be the odd indices of , we can write it like this:

And these sums are elements of the Fourier transforms!

Just one other thing: since has length , will go past and keep going until it reaches , so this indexing of and is treating them as periodic sequences where and the same for . So we would want to write this as being equal to plus the componentwise product of $𝛚$ and , but we can’t, because $𝛚$ has length , not .

However, we can see that

$𝛚 𝛚$

So, if is the identity matrix, and is a diagonal matrix with , then we can write in block matrix notation:

And the FFT algorithm is just taking a length vector and repeating that times! Here it is in pseudocode:

Receive as input a vector , of length for .
If is length 1, return . (Base case.)
Create two vectors and , with the even and odd indices of respectively. In Python notation: , . (This is O(n).)
Run FFT for both of them, getting the vectors and . (Recursive step.)
Make a new vector with . (Also O(n).)
Finally, return the concatenation of the vectors . (Also O(n).)

It’s that simple! And a Python implementation can be basically as short as that. This algorithm runs in O(n log n). The only problem is that it only works for vectors with a length that is a power of two; for other lengths you have to pad your vector with zeroes or use a fancier algorithm.

^
We’re using for convenience.
^
If we have the constant function , which is periodic of every period.
^
All the ones that are integrable and have a finite squared integral .
^
Actually we’re working only with the “reasonable” functions we defined in the previous footnote, which make up a subspace of that subspace.
^
Well, actually a Schauder basis.
^
That is the constant function 1, or the distribution .
^
Multiplied by a constant factor of which we can ignore.
^
We will also treat the signal itself as periodic (), because the same thing shows up in the reverse DFT.
^
A matrix is unitary if its conjugate transpose is equal to its inverse.

Notes on Fourier Analysis