The Discrete Fourier Transform: From Math to FFT

The Discrete Fourier Transform shows up everywhere — audio compression, image processing, polynomial multiplication, solving PDEs numerically. I kept using it as a black box until I sat down and worked through why it’s fast. This is that working-through.

The Definition

Given a sequence of $n$ complex numbers $x_0, x_1, \ldots, x_{n-1}$, the DFT produces another sequence $X_0, X_1, \ldots, X_{n-1}$ defined by:

$$X_k = \sum_{j=0}^{n-1} x_j , e^{-2\pi i jk/n} \qquad k = 0, 1, \ldots, n-1$$

Each output $X_k$ is a dot product of the input signal with a complex sinusoid of frequency $k/n$. The magnitude $|X_k|$ tells you how much of that frequency is present in the signal; the phase $\arg(X_k)$ tells you the offset.

The naïve implementation computes each of the $n$ outputs as a sum of $n$ terms — $O(n^2)$ total. For $n = 10^6$ that’s $10^{12}$ operations. Unusable.

import numpy as np

def dft_naive(x):
    n = len(x)
    X = np.zeros(n, dtype=complex)
    for k in range(n):
        for j in range(n):
            X[k] += x[j] * np.exp(-2j * np.pi * j * k / n)
    return X

The Key Observation

Let $\omega_n = e^{-2\pi i/n}$ (the primitive $n$-th root of unity). The DFT becomes:

$$X_k = \sum_{j=0}^{n-1} x_j , \omega_n^{jk}$$

Now split the sum into even- and odd-indexed terms:

$$X_k = \sum_{j=0}^{n/2-1} x_{2j} , \omega_n^{2jk} + \sum_{j=0}^{n/2-1} x_{2j+1} , \omega_n^{(2j+1)k}$$

Since $\omega_n^{2jk} = \omega_{n/2}^{jk}$, the first sum is the DFT of the even-indexed elements evaluated at $k$. The second is the DFT of the odd-indexed elements, multiplied by $\omega_n^k$:

$$X_k = E_k + \omega_n^k \cdot O_k$$

And crucially — because $\omega_n^{k + n/2} = -\omega_n^k$ — the second half of the outputs comes for free:

$$X_{k+n/2} = E_k - \omega_n^k \cdot O_k$$

Why ω^(k + n/2) = −ω^k

This is the identity that makes the FFT work. Starting from:

$$\omega_n^{k+n/2} = e^{-2\pi i (k + n/2)/n} = e^{-2\pi i k/n} \cdot e^{-2\pi i \cdot \frac{1}{2}} = \omega_n^k \cdot e^{-\pi i}$$

Since $e^{-\pi i} = \cos(-\pi) + i\sin(-\pi) = -1$, we get:

$$\omega_n^{k+n/2} = -\omega_n^k$$

This means once you’ve computed $E_k$ and $\omega_n^k \cdot O_k$ for $k = 0, \ldots, n/2 - 1$, you get both $X_k$ and $X_{k+n/2}$ from the same values — the second half of the output costs nothing extra beyond a sign flip.

The Recursive Algorithm

This decomposition applies recursively. At each level, split into even/odd, solve two half-sized DFTs, combine in $O(n)$. The recursion runs $\log_2 n$ levels deep, giving $O(n \log n)$ total.

def fft(x):
    n = len(x)
    if n == 1:
        return x

    even = fft(x[::2])   # DFT of even-indexed elements
    odd  = fft(x[1::2])  # DFT of odd-indexed elements

    # twiddle factors: ω_n^k for k = 0..n/2-1
    twiddle = np.exp(-2j * np.pi * np.arange(n // 2) / n)

    return np.concatenate([
        even + twiddle * odd,
        even - twiddle * odd,
    ])

The difference in practice is stark:

$n$	Naïve ops	FFT ops	Ratio
1,024	1,048,576	10,240	102×
65,536	4,294,967,296	1,048,576	4,096×
1,048,576	$10^{12}$	20,971,520	47,684×

import time

n = 2**16
x = np.random.randn(n)

t0 = time.perf_counter()
X_naive = dft_naive(x)
print(f"Naïve: {time.perf_counter() - t0:.2f}s")

t0 = time.perf_counter()
X_fft = fft(x)
print(f"FFT:   {time.perf_counter() - t0:.4f}s")

print(f"Max error: {np.max(np.abs(X_naive - X_fft)):.2e}")

What the Output Means

The DFT of a real-valued signal of length $n$ sampled at rate $f_s$ Hz gives frequency bins at:

$$f_k = \frac{k \cdot f_s}{n} \qquad k = 0, 1, \ldots, n/2$$

The bin at $k=0$ is the DC component (mean of the signal). Bins $k = 1$ to $k = n/2 - 1$ are positive frequencies up to the Nyquist limit $f_s/2$. The upper half of the output is the complex conjugate mirror of the lower half (for real inputs) and usually discarded.

The Nyquist limit and aliasing

Sampling a signal at rate $f_s$ means you can only faithfully represent frequencies up to $f_s/2$ — the Nyquist frequency. This isn’t a limitation of the DFT specifically; it’s a fundamental constraint on discrete sampling.

If the signal contains energy above $f_s/2$, those frequencies fold back down (“alias”) into the representable range and corrupt the lower frequencies. A 1100 Hz tone sampled at 2000 Hz looks identical in the DFT to a 900 Hz tone.

Formally: a frequency $f > f_s/2$ aliases to $f_s - f$. You can’t detect or correct aliasing after the fact — information is permanently lost. This is why audio is sampled at 44.1 kHz (covering up to ~22 kHz, safely above human hearing) rather than at a lower rate.

Implementation Note

The recursive version above is clean but slow in practice due to Python overhead and array allocations at each level. NumPy’s np.fft.fft uses an iterative Cooley-Tukey implementation in C with SIMD optimisations — it’s typically 100–1000× faster than the recursive Python version above, even though both are $O(n \log n)$.

For production use, always reach for np.fft.fft. The recursive implementation is for understanding the structure; the library implementation is for running it.

Next I want to look at the 2D DFT used in image compression — it’s the same idea applied along rows then columns, and it’s the basis of JPEG.