melSpectrogram
Mel spectrogram
Syntax
Description
specifies options using one or more S
= melSpectrogram(audioIn
,fs
,Name,Value
)Name,Value
pair arguments.
melSpectrogram(___)
plots the mel spectrogram on a
surface in the current figure.
Examples
Calculate Mel Spectrogram
Use the default settings to calculate the mel spectrogram for an entire audio file. Print the number of bandpass filters in the filter bank and the number of frames in the mel spectrogram.
[audioIn,fs] = audioread('Counting1644p1mono15secs.wav'); S = melSpectrogram(audioIn,fs); [numBands,numFrames] = size(S); fprintf("Number of bandpass filters in filterbank: %d\n",numBands)
Number of bandpass filters in filterbank: 32
fprintf("Number of frames in spectrogram: %d\n",numFrames)
Number of frames in spectrogram: 1551
Plot the mel spectrogram.
melSpectrogram(audioIn,fs)
Calculate Mel Spectrums of 2048Point Windows
Calculate the mel spectrums of 2048point periodic Hann windows with 1024point overlap. Convert to the frequency domain using a 4096point FFT. Pass the frequencydomain representation through 64 halfoverlapped triangular bandpass filters that span the range 62.5 Hz to 8 kHz.
[audioIn,fs] = audioread('FunkyDrums44p1stereo25secs.mp3'); S = melSpectrogram(audioIn,fs, ... 'Window',hann(2048,'periodic'), ... 'OverlapLength',1024, ... 'FFTLength',4096, ... 'NumBands',64, ... 'FrequencyRange',[62.5,8e3]);
Call melSpectrogram
again, this time with no output arguments so that you can visualize the mel spectrogram. The input audio is a multichannel signal. If you call melSpectrogram
with a multichannel input and with no output arguments, only the first channel is plotted.
melSpectrogram(audioIn,fs, ... 'Window',hann(2048,'periodic'), ... 'OverlapLength',1024, ... 'FFTLength',4096, ... 'NumBands',64, ... 'FrequencyRange',[62.5,8e3])
Get Filter Bank Center Frequencies and Analysis Window Time Instants
melSpectrogram
applies a frequencydomain filter bank to audio signals that are windowed in time. You can get the center frequencies of the filters and the time instants corresponding to the analysis windows as the second and third output arguments from melSpectrogram
.
Get the mel spectrogram, filter bank center frequencies, and analysis window time instants of a multichannel audio signal. Use the center frequencies and time instants to plot the mel spectrogram for each channel.
[audioIn,fs] = audioread('AudioArray16164channels20secs.wav'); [S,cF,t] = melSpectrogram(audioIn,fs); S = 10*log10(S+eps); % Convert to dB for plotting for i = 1:size(S,3) figure(i) surf(t,cF,S(:,:,i),'EdgeColor','none'); xlabel('Time (s)') ylabel('Frequency (Hz)') view([0,90]) title(sprintf('Channel %d',i)) axis([t(1) t(end) cF(1) cF(end)]) end
Input Arguments
audioIn
— Audio input
column vector  matrix
Audio input, specified as a column vector or matrix. If specified as a matrix, the function treats columns as independent audio channels.
Data Types: single
 double
fs
— Input sample rate (Hz)
positive scalar
Input sample rate in Hz, specified as a positive scalar.
Data Types: single
 double
NameValue Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Namevalue arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: 'WindowLength',1024
Window
— Window applied in time domain
hamming(round(fs*0.3),'periodic')
(default)  vector
Window applied in time domain, specified as the commaseparated pair consisting of
'Window'
and a real vector. The number of elements in the
vector must be in the range
[1,size(
]. The number of elements in
the vector must also be greater than audioIn
,1)OverlapLength
.
Data Types: single
 double
OverlapLength
— Analysis window overlap length (samples)
round(0.02*fs
)
(default)  integer in the range [0, (WindowLength

1)]
fs
)WindowLength

1)]Analysis window overlap length in samples, specified as the commaseparated pair
consisting of 'OverlapLength'
and an integer in the range
[0, (
.WindowLength
 1)]
Data Types: single
 double
FFTLength
— Number of DFT points
WindowLength
(default)  positive integer
Number of points used to calculate the DFT, specified as the commaseparated pair
consisting of 'FFTLength'
and a positive integer greater than or
equal to WindowLength
. If unspecified,
FFTLength
defaults to WindowLength
.
Data Types: single
 double
NumBands
— Number of mel bandpass filters
32
(default)  positive integer
Number of mel bandpass filters, specified as the commaseparated pair consisting
of 'NumBands'
and a positive integer.
Data Types: single
 double
FrequencyRange
— Frequency range over which to compute mel spectrogram (Hz)
[0 fs
/2]
(default)  twoelement row vector
fs
/2]Frequency range over which to compute the mel spectrogram in Hz, specified as the
commaseparated pair consisting of 'FrequencyRange'
and a
twoelement row vector of monotonically increasing values in the range [0,
. fs
/2]
Data Types: single
 double
SpectrumType
— Type of mel spectrogram
'power'
(default)  'magnitude'
Type of mel spectrogram, specified as the commaseparated pair consisting of
'SpectrumType'
and 'power'
or
'magnitude'
.
Data Types: char
 string
WindowNormalization
— Apply window normalization
true
(default)  false
Apply window normalization, specified as the commaseparated pair consisting of
'WindowNormalization'
and true
or
false
. When WindowNormalization
is set to
true
, the power (or magnitude) in the mel spectrogram is
normalized to remove the power (or magnitude) of the time domain
Window
.
Data Types: char
 string
FilterBankNormalization
— Type of filter bank normalization
'bandwidth'
(default)  'area'
 'none'
Type of filter bank normalization, specified as the commaseparated pair
consisting of 'FilterBankNormalization'
and
'bandwidth'
, 'area'
, or
'none'
.
Data Types: char
 string
Output Arguments
S
— Mel spectrogram
column vector  matrix  3D array
Mel spectrogram, returned as a column vector, matrix, or 3D array. The dimensions
of S
are
LbyMbyN, where:
L is the number of frequency bins in each mel spectrum.
NumBands
andfs
determine L.M is the number of frames the audio signal is partitioned into.
size(
,audioIn
,1)WindowLength
, andOverlapLength
determine M.N is the number of channels such that N =
size(
.audioIn
,2)
Trailing singleton dimensions are removed from the output
S
.
Data Types: single
 double
F
— Center frequencies of mel bandpass filters (Hz)
row vector
Center frequencies of mel bandpass filters in Hz, returned as a row vector with
length size(
.S
,1)
Data Types: single
 double
T
— Location of each window of audio (s)
row vector
Location of each analysis window of audio in seconds, returned as a row vector
length size(
. The location corresponds to
the center of each window.S
,2)
Data Types: single
 double
Algorithms
The melSpectrogram
function follows the general algorithm to compute
a mel spectrogram as described in [1].
In this algorithm, the audio input is first buffered into frames of
numel(
number of samples. The frames are
overlapped by Window
)OverlapLength
number of samples. The specified
Window
is applied to each frame, and then the frame is converted to
frequencydomain representation with FFTLength
number of points. The
frequencydomain representation can be either magnitude or power, specified by
SpectrumType
. If WindowNormalization
is set to
true
, the spectrum is normalized by the window. Each frame of the
frequencydomain representation passes through a mel filter bank. The spectral values output
from the mel filter bank are summed, and then the channels are concatenated so that each frame
is transformed to a NumBands
element column vector.
Filter Bank Design
The mel filter bank is designed as halfoverlapped triangular filters equally spaced on
the mel scale. NumBands
controls the number of mel bandpass filters.
FrequencyRange
controls the band edges of the first and last filters
in the mel filter bank. FilterBankNormalization
specifies the type of
normalization applied to the individual bands.
References
[1] Rabiner, Lawrence R., and Ronald W. Schafer. Theory and Applications of Digital Speech Processing. Upper Saddle River, NJ: Pearson, 2010.
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.
GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.
Version History
Introduced in R2019aR2020b: WindowLength
will be removed in a future release
The WindowLength
parameter will be removed from the
melSpectrogram
function in a future release. Use the
Window
parameter instead.
In releases prior to R2020b, you could only specify the length of a timedomain window. The window was always designed as a periodic Hamming window. You can replace instances of the code
S = melSpectrogram(audioin,fs,'WindowLength',1024);
S = melSpectrogram(audioIn,fs,'Window',hamming(1024,'periodic'));
See Also
spectrogram
 mfcc
 gtcc
 mdct
 audioFeatureExtractor
MATLAB 명령
다음 MATLAB 명령에 해당하는 링크를 클릭했습니다.
명령을 실행하려면 MATLAB 명령 창에 입력하십시오. 웹 브라우저는 MATLAB 명령을 지원하지 않습니다.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
 América Latina (Español)
 Canada (English)
 United States (English)
Europe
 Belgium (English)
 Denmark (English)
 Deutschland (Deutsch)
 España (Español)
 Finland (English)
 France (Français)
 Ireland (English)
 Italia (Italiano)
 Luxembourg (English)
 Netherlands (English)
 Norway (English)
 Österreich (Deutsch)
 Portugal (English)
 Sweden (English)
 Switzerland
 United Kingdom (English)