Lloyd Rochester's Geek Blog

Learn the FFT

In this post I’d like to attempt to explain how the Fast Fourier Transform algorithm works. This post is inspired by many attempts others have made that don’t really do a good job explaining the FFT. Hopefully, I can do a better job. For explanation we’ll use the Cooley–Tukey algorithm which is the most common.

FFT Butterfly Diagram

Above is the so-called Butterfly Diagram which we will arrive at. Refer to the Example FFT Computation in C to see a program that will compute the FFT.

Table of Contents

DFT Definition

The FFT algorithm makes computation of the Discrete Fourier Transform (DFT) more efficient. Much more efficient depending on the size of the input array the computation is being performed on.

Here is the how we’ll define the DFT.

Xk=n=0N1xne2πikn/N=n=0N1xnWNkn

where WNkn=e2πikn/N. Having WNkn will make things easier to keep track of later. The W is short for weight. It’s also referred to as a twiddle factor.

The input to the DFT is an array of complex valued numbers. Each element xn has a real and imaginary part.

FFT Example of size 8

In my opinion the best to way to understand the FFT is using an example of size 8 or where N=8. The reason for this is for smaller N we don’t have enough stages, and for larger N the math becomes untamely. Stand by as we walk through a concrete example.

Xk=n=081xnW8kn

Also, in matrix form for N=8. Visualizing the matrix helps for understanding.

[X0X1X2X3X4X5X6X7]=[W80W80W80W80W80W80W80W80W80W81W82W83W84W85W86W87W80W82W84W86W80W82W84W86W80W83W86W81W84W87W82W85W80W84W80W84W80W84W80W84W80W85W82W87W84W81W86W83W80W86W84W82W80W86W84W82W80W87W86W85W84W83W82W81][x0x1x2x3x4x5x6x7]

Hopefully, I have all those constants correct! Check out the symmetry when looking down the columns of the W matrix! It’s pretty cool.

Why do we need to be efficient?

To do this matrix multiply we need 82=64 multiplies. In Big O Notation this would be O(N2). This is because for each Xn we need to multiply 8 times. Realistically, it’s 4×82=256 since to multiply complex numbers we need 4 multiplies for the real and imaginary parts.

The reason we need to be concerned with multiplication is that computer processors are slower at multiplication than other mathematical operations such as addition. Long ago they used to be much slower at multiplication than say addition, however, they’ve gotten faster so this point isn’t as strong as it used to be.

Let’s take a realistic example of audio sampled at 48kHz. Let’s say we wanted to do an FFT every 21ms where N=1024. We then would have 4×10242=4,194,304 multiplications every 21ms. The 4 here is for the multiplication of complex numbers. That’s not a small number of multiplications, they add up very quickly.

Take for example an ARM processor. We can swag a multiply accumulate instruction of a floating point number takes 3 clock cycles - that’s a very commendable number of instructions for a multiply. Let’s call this 3 floating point operations or FLOPS. For each second we’d have to do this roughly 48 times ( 48×0.021ms1s ) and that is 3×48×4,194,304=603,979,776 instructions per second just for multiplication. This is roughly 600 Mega FLOPS. For a processor running at 1GHz that is a heavy load just to run a 1024 size DFT in real time.

The FFT can do this computation much more efficiently. Instead of N2 it will drastically reduce operations to O(Nlog2N). For our example of 1024 we would now have 4×1024×10=40,960 multiplies for the 21ms. We’d then have a total of 40,960×48×3=5,898,240 multiplies. This is roughly 6 Mega FLOPS which is a factor 10 smaller than what we had above without using the FFT.

Before the Math Starts

The Cooley-Tukey FFT algorithm breaks the DFT into smaller DFTs. Sometimes this is referred to as recursion. There are three parts to this algorithm that we should highlight before we get to the math.

  1. We will divide the even and odd indices of our input xn as many times as it takes to get tuples
  2. When we compute each part of Xn we will do this in stages where we can use computations from previous stages.
  3. We will exploit the symmetry of the cosine and sine functions to reduce our work.

Breaking down a time series into even and odd parts is also called “Decimation in Time”. For audio say our input to the FFT is a series of samples in time.

Breaking the DFT into even and odd parts

Let’s start with an example with an input of size 8. Watch me break this down into 3 stages until we arrive with 4 pairs of tuples:

(x0x1x2x3x4x5x6x7)

(x0x2x4x6)(x1x3x5x7)

(x0x4)(x2x6)(x1x5)(x3x7)

As you can see there is a logarithmic pattern here. Each time we cut the elements in each group by half.

Derive Decimation in time for the General Case

Let’s now take the DFT function and break it down into two DFTs by even and odd numbered indices:

n=0N1xne2πikn/N=m=0N/21x2me2πik(2m)/N+m=0N/21x2m+1e2πik(2m+1)/N

n=0N1xne2πikn/N=m=0N/21x2me2πik(2m)/N+e2πik/Nm=0N/21x2m+1e2πik(2m)/N

WNkn=e2πikn/N

n=0N1xnWNkn=m=0N/21x2mWN/2km+WNkm=0N/21x2m+1WN/2km

Let us summarize this into even Ek and odd parts Ok.

Xk=Ek+WNkOk

As an exercise to the reader we can also drive the following. Xk+N2=EkWNkOk

This leaves us with the following:

Xk=Ek+WNkOk Xk+N2=EkWNkOk

Let’s stop to look at what we’ve derived.

  1. The DFT Xk we broke down to an even Ek and odd part Ok
  2. The DFT is symmetric as the Xk and Xk+N2 differs by only a minus sign on the odd part Ok.
  3. The odd part is multiped by a constant WNk that doesn’t depend on n or the inputs xn.

Now we can keep breaking the even and odd parts further until we cannot do so anymore. This will leave us with groups of tuples.

Example FFT of size 8

Let’s take an example FFT of size 8. Again, size 8 is the Goldilocks example as it gives us 3 stages to work with and the math is manageable. The algorithm will break into 3 stages since log28=3. We will start with the smallest pair as the algorithm is recursive. This will make sense later when we look at the butterfly diagram.

Recursive Breakdown

Let’s use 3 steps to break down the FFT. As you’ll see 3-stages is the farthest we can break it down.

Xk=n=081xnW8kn=m=041x2mW4km+W8km=041x2m+1W4km=(p=021x2pW2kp+W4kp=021x2p+1W2kp)+W8k(p=021x2pW2kp+W4kp=021x2p+1W2kp)=(x0+x4W2k)+W4k(x2+x6W2k)+W8k((x1+x5W2k)+W4k(x3+x7W2k))

For the last equation we used W20=1 and p=2m for example 2p+1=4m+2 for a fully broken down DFT of size 8.

Stage 1: Four sets of Tuples

For our first stage we will compute, and store in temporary locations each of the inner most expressions:

S10=(x0+x4W2k)S11=(x2+x6W2k)S12=(x1+x5W2k)S13=(x3+x7W2k)

As we’ll see later the weights W2k only take on 2 values, leaving 8 values to store.

Stage 2: Two parts of Four

S20=S10+W4kS11S21=S12+W4kS13

Although we have two equations Stage 2, the weight W4k can take on 4 different values. Again, we have 8 values to store.

Stage 3: One part of Eight

S30=S20+W8kS21

For our third stage we only have one equation, however W8k can take on 8 values and thus we have 8 values to store.

Exploit the Symmetry of the Weights

Now we need to look at the following weights W. We can think of these as the values that are taken on as we walk around the unit circle in steps. The size of the steps depends on the subscript.

For our example we have 3 different weights to analyze: W2kW4kW8k

Around the Unit Circle in half’s

When we have W2k we only have 2 values as we walk around the unit circle in half’s. We are hopping between these two values.

Going around the Unit Circle in two hops

Let’s break this down into 8 values as we have for our example.

W2k=eπik=(1)k=[11111111]

This weight only takes on two values W2k=∈{1,1}.

Around the Unit circle in fourths

For the weights W4k we are going around the unit circle in fourths. We will have 4 different values we can have around the unit circle.

Going around the unit circle in four hops

Let’s show this for each value of k.

W4k=eπik/2=[1i1i1i1i]

This weight takes on 4 values W4k=∈{1,1,i,i}. Note, we only have really 2 values that differ by sign.

Around the Unit circle in eights

For the weights W8k we are going around the unit circle 8 times. We will have 8 different values we can have around the unit circle.

Going around the unit circle in eight hops

Let’s show this for each value of k.

W8k=eπik/4=[112(1i)i12(1i)112(1+i)i12(1+i)]

This weight takes on 8 values. However, we really only have 3 values when we look at the symmetry of the upper left quadrant and the other 3 quadrants.

Butterfly Diagram

Let’s take our fully broken down equation:

Xk=(x0+x4W2k)+W4k(x2+x6W2k)+W8k((x1+x5W2k)+W4k(x3+x7W2k))

Let’s take some example values of the result from bins of the FFT, X0,X1,X4.

X0=(x0+x4)+(x2+x6)+(x1+x5)+(x3+x7)X1=(x0x4)i(x2x6)+12(1i)((x1x5)i(x3x7))X4=(x0+x4)+(x2+x6)((x1+x5)+(x3+x7))

Note, the symmetry of X0 and X4.

Now, we can take the equations above to leap to the butterfly diagram below. For each stage we can utilize the results from previous stages to reduce computational load.

FFT Butterfly Diagram

The butterfly diagram above has the weights or twiddle factors we have represented by W left out. However, to arrive at each stage we use the examples above. The ingenious aspect of this algorithm is that the we can compute partial parts of each Xk in each stage and use these computations for later stages.

Contact Me for any issues or improvements that can be made to this post.