Nonstationary Gabor Frames and the Constant-Q Transform

Nonstationary Gabor frames enable you to implement time-adaptive or frequency-adaptive analysis of signals. The functions cqt and icqt use nonstationary Gabor frames to obtain a constant-Q (frequency-adaptive) transform (CQT) of a signal. A notable strength of nonstationary Gabor frames is that they enable the construction of stable inverses, yielding perfect reconstruction.

The theory of nonstationary Gabor transforms (NSGTs) was introduced by Jaillet [1] and Balazs, Dörfler, Jaillet, Holighaus, and Velasco [2]. The theory enables efficient implementations of NGSTs using FFT-based methods. Dörfler, Holighaus, Grill, and Velasco [3], [4] develop a framework for an efficient, perfectly invertible CQT. The algorithms in [3], [4] implement a phase-locked version of the CQT that does not preserve the same phases that would be obtained by naïve convolution. In [5], Schörkhuber, Klapuri, Holighaus, and Dörfler develop efficient algorithms for the CQT and inverse CQT that do mimic the coefficients obtained by naïve convolution. The Large Time-Frequency Analysis Toolbox [6] provides an extensive set of algorithms for nonstationary Gabor analysis and synthesis.

In standard Gabor analysis, a window of fixed size tiles the time-frequency plane. A nonstationary Gabor frame is a collection of window functions of various sizes that are used to tile the time-frequency plane. Wavelet analysis tiles the time-frequency plane in a similar manner. You have the flexibility to change the sampling density in time or frequency. Nonstationary Gabor frames are useful in areas such as audio signal processing, where fixed-sized time-frequency windows are not optimal. Unlike the short-time Fourier transform, the windows used in the constant-Q transform have adaptable bandwidth and sampling density. In frequency space, the windows are centered at logarithmically spaced center frequencies.

Decomposing the Time-Frequency Plane

The Fourier transform of f(t) is the correlation of f(t) with e^{j ω t}:

$F (ω) = \int_{- \infty}^{\infty} f (t) e^{- j ω t} d t .$

Since e^{j ω t} does not have compact support, the Fourier transform is not an ideal choice for studying nonstationary signals. If the frequency content of a signal changes over time, the Fourier transform does not capture what those changes are or when those changes occur. The partition of the time-frequency plane shown here represents this Fourier transform behavior.

To perform a time-frequency analysis of a nonstationary signal f(t), use a window function $g (t)$ that is:

Even and real-valued.
Effectively nonzero over only a finite interval.
Has norm equal to one.
The Fourier transform of $g (t)$ is centered at zero and is lowpass.

Slide the window $g (t)$ over f(t) and take the Fourier transform of the result:

$S F (u, ζ) = \int f (t) g (t - u) e^{- j ζ t} d t .$

Correlating f(t) with the Gabor atoms $g (t - u) e^{j ζ t}$ is standard Gabor analysis. By varying u, you consider only values of f(t) near time u. The support of $g (t)$ determines the size of the neighborhood near time u. The Fourier transform of $g_{u, ζ} (t) = g (t - u) e^{ζ t}$ is the translation by ζ of the Fourier transform of $g (t)$ and is given by

${\hat{g}}_{u, ζ} (ω) = e^{- (ω - ζ)} \hat{g} (ω - ζ) .$

The energy concentration of ${\hat{g}}_{u, ζ} (ω)$ has variance σ_ω and is centered at ζ. If the window, $g_{u, ζ} (t) = g (t - u) e^{ζ t}$ , shifts on a regular grid, the Fourier transform of the product of the shifted window and f(t) is the short-time Fourier transform (STFT). The STFT tiling of the time-frequency plane can be represented as a grid of boxes, each centered at (u, ζ):

The set of functions ${g_{u, ζ}}$ is known as a Gabor frame. The elements of this set are called Gabor atoms. A frame is a set of functions, {h_k(t)}, that satisfy the following condition: there exist constants 0 < A ≤ B < ∞ such that for any function f(t),

$A ‖ f ‖^{2} \leq Σ_{k} | 〈 f, h_{k} 〉 |^{2} \leq B ‖ f ‖^{2} .$

The energy concentration of $g (t)$ , in time, has variance σ_t. The energy concentration of $\hat{g} (ω)$ , in frequency, has variance σ_ω. The energy concentration determines how well the window localizes the signal in time and frequency. By the time-frequency uncertainty principle, there is a limit as to how well you can simultaneously localize in both time and frequency domains, as indicated by

$σ_{t} σ_{ω} \geq \frac{1}{2} .$

Narrowing the window in one domain results in poorer localization in the other domain. Gabor showed that the area of the window is minimal when $g (t)$ is Gaussian.

Constant-Q Transform

In the CQT, the bandwidth and sampling density in frequency are varied. The windows are constructed and applied directly in the frequency domain. Different windows have different center frequencies and bandwidths, but the ratio of the center frequency to bandwidth remains constant. Maintaining a constant ratio implies:

Resolution in time improves at higher frequencies.
Resolution in frequency improves at lower frequencies.

The time shifts for each window depend on the bandwidth, due to the uncertainty principle.

The CQT depends on:

The window functions g_k are real-valued, even functions. In the frequency domain, the Fourier transform of g_k is defined on the interval, [-Fs/2, Fs/2].
The sampling rate, ζ_s.
The number of bins per octave, b.
The minimum and maximum frequencies, ζ_min and ζ_max.

Choose a minimum frequency ζ_min and number of bins per octave b. Next, form a sequence of geometrically spaced frequencies,

ζ_k = ζ_min × 2^k/b

for k = 0,...,K where K is an integer such that ζ_K is the largest frequency strictly less than the Nyquist frequency ζ_s/2. The bandwidth at the kth frequency is set to Ω_k = ζ_k+1-ζ_k-1. Given this sampling, the ratio of the kth center frequency to the window bandwidth is independent of k:

Q = ζ_k/Δ_k = (2^1/b-2^-1/b)^-1.

To ensure perfect reconstruction, the DC component and Nyquist frequency are prepended and appended, respectively, to the sequence.

W(ω) forms the window functions g_k. W(ω) is a real-valued, even continuous function that is centered at 0, positive in the interval [-½,½], and 0 elsewhere. W(ω) is translated to each center frequency ζ_k then scaled. Evaluating a scaled and translated version of W(ω) yields the filter coefficients g_k[m], given by

g_k[m] = W((m ζ_s/L - ζ_k)/Ω_k)

for m = 0, …, L-1, where L is the signal length. By default, cqt uses the 'hann' window.

By the uncertainty principle, the size of the bandwidth constrains the value of the time shifts. To satisfy the frame inequality, the shift a_kof g_k must satisfy

a_k ≤ ζ_k/Ω_k.

As mentioned previously, the window is applied in the frequency domain. The filters, g_k, centered at ζ_k, are formed and applied to the Fourier transform of the signal. Taking the inverse transform obtains the constant-Q coefficients.

References

[1] Jaillet, Florent. “Représentation et traitement temps-fréquence des signaux audionumériques pour des applications de design sonore.” Ph.D. dissertation, Université de la Méditerranée, Aix-Marseille II, 2005.

[2] Balazs, P., M. Dörfler, F. Jaillet, N. Holighaus, and G. Velasco. “Theory, Implementation and Applications of Nonstationary Gabor Frames.” Journal of Computational and Applied Mathematics 236, no. 6 (October 2011): 1481–96. https://doi.org/10.1016/j.cam.2011.09.011.

[3] Holighaus, Nicki, M. Dörfler, G. A. Velasco, and T. Grill. “A Framework for Invertible, Real-Time Constant-Q Transforms.” IEEE Transactions on Audio, Speech, and Language Processing 21, no. 4 (April 2013): 775–85. https://doi.org/10.1109/TASL.2012.2234114.

[4] Velasco, G. A., N. Holighaus, M. Dörfler, and T. Grill. "Constructing an invertible constant-Q transform with nonstationary Gabor frames." In Proceedings of the 14th International Conference on Digital Audio Effects (DAFx-11). Paris, France: 2011.

[5] Schörkhuber, C., A. Klapuri, N. Holighaus, and M. Dörfler. "A MATLAB^® Toolbox for Efficient Perfect Reconstruction Time-Frequency Transforms with Log-Frequency Resolution." Submitted to the AES 53rd International Conference on Semantic Audio. London, UK: 2014.

[6] Průša, Z., P. L. Søndergaard, N. Holighaus, C. Wiesmeyr, and P. Balazs. The Large Time-Frequency Analysis Toolbox 2.0. Sound, Music, and Motion, Lecture Notes in Computer Science 2014, pp 419–442. https://github.com/ltfat