fgg blog

Short_time_Fourier_Transform

Sine Wave Signal

An audio signal, y(t), composed of exactly one sine wave, can be completely described by the parameters $t, A, f$ and $\phi$, $$ y(t) = A \sin(2 \pi f t + \phi) $$ where $t$ represents time in seconds, $A$ is the wave’s amplitude (unit-less), $f$ is its frequency in Hz, and $\phi$ is its phase offset in radians (i.e., where in the cycle the wave is at $t=0$). If $t \ne 0$, then the sine wave appears shifted in time by $\frac{\phi}{2 \pi f}$, where negative values mean “delay” and positive “advance” it.

Fourier Series

Our old pal Fourier told us that any sound can be represented as an infinite summation of sine waves each with their own amplitudes, frequencies, and phase offsets. This means that any sound we hear can be represented as many, many tuples of $t, A, f, \phi$.

Time-Frequency representation

A Time-Frequency representation is a 2 dimensional matrix that represents the frequency contents of an audio signal over time.

We can visualize a TF Representation using a heatmap, which has time along the x-axis and frequency along the y-axis. Each TF bin (entry in heatmap) in the heatmap represents the amplitude of the signal at that particular time and frequency. If there is no color bar, it is usually safe to assume that brighter colors indicate higher amplitudes than darker colors.

TFrepr

Short-time Fourier Transform (STFT)

An STFT is calculated from a waveform representation by computing a discrete Fourier transform (DFT) of a small, moving window across the duration of the window. The location of each entry in an STFT determines its time (x-axis) and frequency (y-axis). The absolute value of a TF bin |$X(t,f)$| at time t and frequency f determines the amount of energy heard from frequency $f$ at time $t$.

Importantly, each bin in our STFT is complex, meaning each entry contains both a magnitude component and a phase component. Both components are needed to convert an STFT matrix back to a waveform by inverse STFT so that we may hear it.

STFT

Window Types

The window type determines the shape of the short-time window that will segment the audio into short segments before applying the DFT. The shape of this window will affect which frequencies get emphasized or attenuated in the DFT. There are many types of window functions.

windows