Main Content


Short-time objective intelligibility measure

Since R2024a



    metric = stoi(processed,reference,fs) returns the short-time objective intelligibility (STOI) measurement. STOI is a speech intelligibility metric that compares the processed speech signal with a clean reference signal.


    collapse all

    Read in an audio file containing a clean speech signal. Add pink noise to create a noisy speech signal.

    [cleanSpeech,fs] = audioread("Rainbow-16-8-mono-114secs.wav");
    noisySpeech = cleanSpeech + 0.1*pinknoise(size(cleanSpeech));

    Use stoi to measure the intelligibility of the noisy speech signal with the clean speech as the reference signal.

    metric = stoi(noisySpeech,cleanSpeech,fs)
    metric = 0.9811

    Recreate the noisy speech signal with more pink noise and measure the intelligibility. See how the noisier signal has lower intelligibility according to the STOI metric.

    noisySpeech = cleanSpeech + 3*pinknoise(size(cleanSpeech));
    metric = stoi(noisySpeech,cleanSpeech,fs)
    metric = 0.5943

    Read in an audio file containing speech and noise. Also read in an audio file containing the original clean speech to use as a reference signal.

    [noisySpeech,fs] = audioread("NoisySpeech-16-mono-3secs.ogg");
    reference = audioread("CleanSpeech-16-mono-3secs.ogg");

    Calculate the STOI metric for the noisy speech signal using stoi.

    noisySpeechSTOI = stoi(noisySpeech,reference,fs)
    noisySpeechSTOI = 0.8370

    Use enhanceSpeech to enhance the speech signal. Evaluate the enhanced signal using the STOI metric and see the improvement compared to the STOI of the noisy signal.

    enhancedSpeech = enhanceSpeech(noisySpeech,fs);
    enhancedSpeechSTOI = stoi(enhancedSpeech,reference,fs)
    enhancedSpeechSTOI = single

    Input Arguments

    collapse all

    Processed speech signal, specified as a column vector (single channel) with the same size as reference. STOI measures the intelligibility of this processed signal.

    Data Types: single | double

    Reference speech signal, specified as a column vector (single channel) with the same size as processed. The STOI metric compares the processed signal with this reference signal to measure the intelligibility.

    Data Types: single | double

    Sample rate of both the processed and reference signals in Hz, specified as a positive scalar.

    Data Types: single | double

    Output Arguments

    collapse all

    STOI metric, returned as a scalar in the range [-1,1]. STOI measures the intelligibility of the processed input signal by comparing it with the clean reference signal. A higher value for the metric corresponds to a more intelligible speech signal.


    [1] C. H. Taal, R. C. Hendriks, R. Heusdens and J. Jensen, "A short-time objective intelligibility measure for time-frequency weighted noisy speech," 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA, 2010, pp. 4214-4217, doi: 10.1109/ICASSP.2010.5495701.

    [2] C. H. Taal, R. C. Hendriks, R. Heusdens and J. Jensen, "An Algorithm for Intelligibility Prediction of Time–Frequency Weighted Noisy Speech," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 7, pp. 2125-2136, Sept. 2011, doi: 10.1109/TASL.2011.2114881.

    Extended Capabilities

    C/C++ Code Generation
    Generate C and C++ code using MATLAB® Coder™.

    GPU Arrays
    Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

    Version History

    Introduced in R2024a