The Discrete Fourier Transform: From Math to FFT
The Discrete Fourier Transform shows up everywhere — audio compression, image processing, polynomial multiplication, solving PDEs numerically. I kept using it as a black box until I sat down and worked through why it’s fast. This is that working-through.
The Definition
Given a sequence of $n$ complex numbers $x_0, x_1, \ldots, x_{n-1}$, the DFT produces another sequence $X_0, X_1, \ldots, X_{n-1}$ defined by:
$$X_k = \sum_{j=0}^{n-1} x_j , e^{-2\pi i jk/n} \qquad k = 0, 1, \ldots, n-1$$
Each output $X_k$ is a dot product of the input signal with a complex sinusoid of frequency $k/n$. The magnitude $|X_k|$ tells you how much of that frequency is present in the signal; the phase $\arg(X_k)$ tells you the offset.
The naïve implementation computes each of the $n$ outputs as a sum of $n$ terms — $O(n^2)$ total. For $n = 10^6$ that’s $10^{12}$ operations. Unusable.
import numpy as np
def dft_naive(x):
n = len(x)
X = np.zeros(n, dtype=complex)
for k in range(n):
for j in range(n):
X[k] += x[j] * np.exp(-2j * np.pi * j * k / n)
return X
The Key Observation
Let $\omega_n = e^{-2\pi i/n}$ (the primitive $n$-th root of unity). The DFT becomes:
$$X_k = \sum_{j=0}^{n-1} x_j , \omega_n^{jk}$$
Now split the sum into even- and odd-indexed terms:
$$X_k = \sum_{j=0}^{n/2-1} x_{2j} , \omega_n^{2jk} + \sum_{j=0}^{n/2-1} x_{2j+1} , \omega_n^{(2j+1)k}$$
Since $\omega_n^{2jk} = \omega_{n/2}^{jk}$, the first sum is the DFT of the even-indexed elements evaluated at $k$. The second is the DFT of the odd-indexed elements, multiplied by $\omega_n^k$:
$$X_k = E_k + \omega_n^k \cdot O_k$$
And crucially — because $\omega_n^{k + n/2} = -\omega_n^k$ — the second half of the outputs comes for free:
$$X_{k+n/2} = E_k - \omega_n^k \cdot O_k$$
Why ω^(k + n/2) = −ω^k
This is the identity that makes the FFT work. Starting from:
$$\omega_n^{k+n/2} = e^{-2\pi i (k + n/2)/n} = e^{-2\pi i k/n} \cdot e^{-2\pi i \cdot \frac{1}{2}} = \omega_n^k \cdot e^{-\pi i}$$
Since $e^{-\pi i} = \cos(-\pi) + i\sin(-\pi) = -1$, we get:
$$\omega_n^{k+n/2} = -\omega_n^k$$
This means once you’ve computed $E_k$ and $\omega_n^k \cdot O_k$ for $k = 0, \ldots, n/2 - 1$, you get both $X_k$ and $X_{k+n/2}$ from the same values — the second half of the output costs nothing extra beyond a sign flip.
The Recursive Algorithm
This decomposition applies recursively. At each level, split into even/odd, solve two half-sized DFTs, combine in $O(n)$. The recursion runs $\log_2 n$ levels deep, giving $O(n \log n)$ total.
def fft(x):
n = len(x)
if n == 1:
return x
even = fft(x[::2]) # DFT of even-indexed elements
odd = fft(x[1::2]) # DFT of odd-indexed elements
# twiddle factors: ω_n^k for k = 0..n/2-1
twiddle = np.exp(-2j * np.pi * np.arange(n // 2) / n)
return np.concatenate([
even + twiddle * odd,
even - twiddle * odd,
])
The difference in practice is stark:
| $n$ | Naïve ops | FFT ops | Ratio |
|---|---|---|---|
| 1,024 | 1,048,576 | 10,240 | 102× |
| 65,536 | 4,294,967,296 | 1,048,576 | 4,096× |
| 1,048,576 | $10^{12}$ | 20,971,520 | 47,684× |
import time
n = 2**16
x = np.random.randn(n)
t0 = time.perf_counter()
X_naive = dft_naive(x)
print(f"Naïve: {time.perf_counter() - t0:.2f}s")
t0 = time.perf_counter()
X_fft = fft(x)
print(f"FFT: {time.perf_counter() - t0:.4f}s")
print(f"Max error: {np.max(np.abs(X_naive - X_fft)):.2e}")
What the Output Means
The DFT of a real-valued signal of length $n$ sampled at rate $f_s$ Hz gives frequency bins at:
$$f_k = \frac{k \cdot f_s}{n} \qquad k = 0, 1, \ldots, n/2$$
The bin at $k=0$ is the DC component (mean of the signal). Bins $k = 1$ to $k = n/2 - 1$ are positive frequencies up to the Nyquist limit $f_s/2$. The upper half of the output is the complex conjugate mirror of the lower half (for real inputs) and usually discarded.
The Nyquist limit and aliasing
Sampling a signal at rate $f_s$ means you can only faithfully represent frequencies up to $f_s/2$ — the Nyquist frequency. This isn’t a limitation of the DFT specifically; it’s a fundamental constraint on discrete sampling.
If the signal contains energy above $f_s/2$, those frequencies fold back down (“alias”) into the representable range and corrupt the lower frequencies. A 1100 Hz tone sampled at 2000 Hz looks identical in the DFT to a 900 Hz tone.
Formally: a frequency $f > f_s/2$ aliases to $f_s - f$. You can’t detect or correct aliasing after the fact — information is permanently lost. This is why audio is sampled at 44.1 kHz (covering up to ~22 kHz, safely above human hearing) rather than at a lower rate.
Implementation Note
The recursive version above is clean but slow in practice due to Python overhead and array allocations at each level. NumPy’s np.fft.fft uses an iterative Cooley-Tukey implementation in C with SIMD optimisations — it’s typically 100–1000× faster than the recursive Python version above, even though both are $O(n \log n)$.
For production use, always reach for np.fft.fft. The recursive implementation is for understanding the structure; the library implementation is for running it.
Next I want to look at the 2D DFT used in image compression — it’s the same idea applied along rows then columns, and it’s the basis of JPEG.