In this post we’ll provide a simple Fast Fourier Transform (FFT) example in C where we will sacrifice performance and external code dependencies for readability and simplicity. After understanding this simple FFT example it should trivial to modify for performance and adaptations for computer architecture. Please note this has been run on an Arduino and the performance is sub-par. I’d say this FFT implementation is probably best suited for a microcontroller.

How to use the FFT Example

In the example below we’ll perform an FFT on a complex (real + imaginary) array of 32 elements. After the FFT has completed it will write over the provided arrays as the FFT is done “in place”.

#define N_FFT 32
double signal_re[N_FFT]; // real signal array
double signal_im[N_FFT]; // imaginary signal array
fft(signal_re,signal_im,N_FFT); // fft will write over the two previous arrays

A longer example to test the FFT is provided in the sections below. These tests show more in-depth how the FFT is used.

C API Header of the FFT

We only need a single call to the function called fft and we have two helper functions rearrange and compute. The rearrange function will rearrange the elements in the array to make the butterfly computations much easier. The compute function does all the heavy lifting and is where the FFT is computed.

// file fft.h

// The arrays for the fft will be computed in place
// and thus your array will have the fft result
// written over your original data.
// Must provide an array of real and imaginary floats
// where they are both of size N
fft(float data_re[], float data_im[],const int N);

// helper functions called by the fft
// data will first be rearranged then computed
void rearrange(float data_re[],float data_im[],const int N);
void compute(float data_re[],float data_im[],const int N);

C Implementation of the FFT

Now it’s time to look at the implementation. I wished I had with me the fancy diagrams of butterflies and twiddle factors I used on paper to implement this, but it’s been took long. To recreate the diagrams the best way would be to use the debugger and a sheet to paper as it computes an FFT of size 8. Then have the theory and math alongside you.

// file fft.c
#include <stdio.h>
#include <math.h>
#include "fft.h"

void fft(float data_re[], float data_im[],const int N)

void rearrange(float data_re[],float data_im[],const unsigned int N)
  unsigned int target = 0;
  for(unsigned int position=0; position<N;position++)
    if(target>position) {
      const float temp_re = data_re[target];
      const float temp_im = data_im[target];
      data_re[target] = data_re[position];
      data_im[target] = data_im[position];
      data_re[position] = temp_re;
      data_im[position] = temp_im;
    unsigned int mask = N;
    while(target & (mask >>=1))
      target &= ~mask;
    target |= mask;

void compute(float data_re[],float data_im[],const unsigned int N)
  const float pi = -3.14159265358979323846;
  for(unsigned int step=1; step<N; step <<=1) {
    const unsigned int jump = step << 1;
    const float step_d = (float) step;
    float twiddle_re = 1.0;
    float twiddle_im = 0.0;
    for(unsigned int group=0; group<step; group++)
      for(unsigned int pair=group; pair<N; pair+=jump)
        const unsigned int match = pair + step;
        const float product_re = twiddle_re*data_re[match]-twiddle_im*data_im[match];
        const float product_im = twiddle_im*data_re[match]+twiddle_re*data_im[match];
        data_re[match] = data_re[pair]-product_re;
        data_im[match] = data_im[pair]-product_im;
        data_re[pair] += product_re;
        data_im[pair] += product_im;
      // we need the factors below for the next iteration
      // if we don't iterate then don't compute
      if(group+1 == step)
      float angle = pi*((float) group+1)/step_d;
      twiddle_re = cos(angle);
      twiddle_im = sin(angle);

Performance of the Example

Here are some numbers I got when running on an Arduino DUE 32-Bit ARM at 84MHz. Please note I’m not sure now what datatype I used. Above we have float but I’m quite sure it was changed to int. Below are some times I collected for each size.

  1. FFT Size 8 took 4.5ms
  2. FFT Size 16 took 11.5ms
  3. FFT Size 32 took 27.5ms
  4. FFT Size 64 took 61.5ms
  5. FFT Size 128 took 133.5ms

The major factors to the speed are as follows:

  1. Datatype. Going from char, int, float, double makes a huge difference especially if you don’t have a math co-processor. If you’d use this in real life, it would need to be modified and the datatype carefully considered depending on computer architecture.
  2. Multiplication. This really hurts say on an Atmel Arduino if it has to do multiplies on an int.
  3. Cosine/Sine generation. If you knew the size of the FFT then a look-up table would be a faster option than computing every time
  4. Re-arranging. Having to re-arrange the data takes good time.

Test Cases for the FFT

We will conclude with how the FFT example was tested and some test cases so the usage can be seen. Again, the type here is float and this will run well on a processor with a math co-processor. If you’re using a microcontroller then you’d want to consider type char or int.

#include <stdio.h>
#include <math.h>
#include <time.h>
#include "fft.h"

int compare_arrays(const float x[],const float y[], const unsigned int N,const float eps);
void print_arr(const float data[],const unsigned int N);
void print_test_result(int tc_re, int tc_im, int tc_num);

// We will run 4 test cases to ensure our FFT data is correct
int main(int argc, char **argv)
  int i; // loop iterator
  clock_t start, stop;
  double cpu_time_used;

  // Test Case 1
  float data1_re[8] = {0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0};
  float data1_im[8] = {7.0,6.0,5.0,4.0,3.0,2.0,1.0,0.0};
  float expected1_re[8] = {28.0,5.656,0.0,-2.343,-4.0,-5.656,-8.0,-13.656};
  float expected1_im[8] = {28.0,13.656,8.0,5.656,4.0,2.343,0.0,-5.656};
  int tc1_re = compare_arrays(data1_re,expected1_re,8,0.01);
  int tc1_im = compare_arrays(data1_im,expected1_im,8,0.01);

  // Test Case 2
  float data2_re[8] = {1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0};
  float data2_im[8] = {1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0};
  float expected2_re[8] = {8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0};
  float expected2_im[8] = {8.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0};
  int tc2_re = compare_arrays(data2_re,expected2_re,8,0.01);
  int tc2_im = compare_arrays(data2_im,expected2_im,8,0.01);

  // Test Case 3
  float data3_re[8] = { 1.0,-1.0, 1.0,-1.0, 1.0,-1.0, 1.0,-1.0};
  float data3_im[8] = {-1.0, 1.0,-1.0, 1.0,-1.0, 1.0,-1.0, 1.0};
  float expected3_re[8] = {0.0,0.0,0.0,0.0, 8.0,0.0,0.0,0.0};
  float expected3_im[8] = {0.0,0.0,0.0,0.0,-8.0,0.0,0.0,0.0};
  int tc3_re = compare_arrays(data3_re,expected3_re,8,0.01);
  int tc3_im = compare_arrays(data3_im,expected3_im,8,0.01);

  // Test Case 4
  float data4_re[4] = {1.0,2.0,3.0,4.0};
  float data4_im[4] = {0.0,0.0,0.0,0.0};
  float expected4_re[4] = {10.0,-2.0,-2.0,-2.0};
  float expected4_im[4] = {0.0,2.0,0.0,-2.0};
  int tc4_re = compare_arrays(data4_re,expected4_re,4,0.01);
  int tc4_im = compare_arrays(data4_im,expected4_im,4,0.01);

  // Test Case 5
  float data5_re[128];
  float data5_im[128];

  start = clock();
  for(i=0;i<100000;i++) fft(data5_re,data5_im,128);
  stop = clock();
  cpu_time_used = ((double) (stop - start)) / CLOCKS_PER_SEC;
  printf("Average time per fft %fms",cpu_time_used/1000);

void print_test_result(int tc_re, int tc_im, int tc_num)
  int res = tc_re+tc_im;
  if(res == 2) {
    printf("Test Case %d: Passed\n",tc_num);
  } else {
    printf("Test Case %d: Failed\n",tc_num);

int compare_arrays(const float x[],const float y[], const unsigned int N,const float eps)
  int result = 1;
  for(unsigned int i=0;i<N;i++)
    if(fabs(x[i]-y[i])>eps) {
      result = 0;

    printf("Expected: ");
    printf("Got     : ");

  return 1;

void print_arr(const float data[],const unsigned int N)
  for(unsigned int i=0;i<N-1;i++)