Label Audio Using Audio Labeler

The Audio Labeler app enables you to interactively define and visualize ground-truth labels for audio data sets. This example shows how you can create label definitions and then interactively label a set of audio files. The example also shows how to export the labeled ground-truth data, which you can then use with audioDatastore to train a machine learning system.

Load Unlabeled Data

  1. To open the Audio Labeler, at the MATLAB® command prompt, enter:

    audioLabeler

  2. This example uses the audio files included with Audio Toolbox™. To locate the file path on your system, at the MATLAB command prompt, enter:

    fullfile(matlabroot,'toolbox','audio','samples')

    To load audio from a file, click Load > Audio Folders and select the folder containing audio files you want to label.

Define and Assign Labels

File-Level Labels

The audio samples include music, speech, and ambience. To create a file-level label that defines the contents of the audio file as music, speech, ambience, or unknown, click . Specify the Label Name as Content, the Data Type as categorical, and the Categories as music, speech, ambience, or unknown. Set the Default Value of the label definition to unknown.

All audio files in the Data Browser are now associated with the Content label name. To listen to the audio file selected in the Data Browser and confirm that it is a music file, click . To set the value of the Contents label, click unknown in the File Labels panel and select music from the drop-down menu.

The selected audio file now has the label name Content with value music assigned to it. You can continue setting the Content value for each file by selecting a file in the Data Browser and then selecting a value from the File Labels panel.

Region-Level Labels

You can define region-level labels manually or by using the provided automated algorithms. Audio Toolbox includes automatic labeling algorithms for speech detection and speech-to-text transcription.

Note

To enable automatic speech-to-text transcription, you must download and set up the Speech-to-Text Transcription functionality. Once you download and set up the speech-to-text transcription functionality, the Speech to Text automation algorithm appears as an option on the toolstrip.

Select Counting-16-44p1-mono-15secs.wav from the Data Browser.

To create a region-level label that indicates if speech is detected, first select Speech Detector from the AUTOMATION section. You can control the speech detection algorithm using the Window Length (s) and Merge Regions Within (s) parameters. Use the default parameters for the speech detection algorithm. To create an ROI label and to label regions of the selected audio file, select Run.

Close the Speech Detector tab. You can correct or fine-tune the automatically generated SpeechDetected regions by selecting the ROI from the ROI bar, and then dragging the edges of the region. The ROI bar is directly to the right of the ROI label. When a region is selected, clicking plays only the selected region, enabling you to verify whether the selected region captures all relevant auditory information.

If you have set up a speech-to-text transcription service, select Speech to Text from the Automation section. You can control the speech-to-text transcription using name-value pair options specific to your selected service. This example uses the IBM® service and specifies no additional options.

The ROI labels returned from the transcription service are strings with beginning and end points. The beginning and end points do not exactly correspond to the beginning and end points of the manually corrected speech detection regions. You can correct the endpoints of the SpeechContent ROI label by selecting the region and then dragging the edges of the region. The transcription service misclassified the words "two" as "to," "four" as "for," and "ten" as "then." You can correct the string by selecting the region and then entering a new string.

Create another region-level label by clicking in the ROI Labels panel. Set Label Name to VUV, set Data Type to categorical, and Categories to voiced and unvoiced.

By default, the waveform viewer shows the entire file. To display tools for zooming and panning, hover over the top right corner of the plot. Zoom in on the first five seconds of the audio file.

When you select a region in the plot and then hover over any of the two ROI bars, the shadow of the region appears. To assign the selected region to the category voiced, click one on the SpeechContent label bar. Hover over the VUV label bar and then click the shadow and choose voiced.

The next two words, "two" and "three," contain both voiced and unvoiced speech. Select each region of speech on the plot, hover over the VUV label bar, and select the correct category for that region.

Export Label Definitions

You can export label definitions as a MAT file or as a MATLAB script. Maintaining label definitions enables consistent labeling between users and sessions. Select Export > Label Definitions > To File.

The labels are saved as an array of signalLabelDefinition objects. In your next session, you can import the label definitions by selecting Import > Label Definitions > From File.

Export Labeled Audio Data

You can export the labeled signal set to a file or to your workspace. Select Export > Labels > To Workspace.

The Audio Labeler creates a labeledSignalSet object named labeledSet_HHMMSS, where HHMMSS is the time the object is created in hours, minutes, and seconds.

labeledSet_104620
labeledSet_104620 = 

  labeledSignalSet with properties:

             Source: {29×1 cell}
         NumMembers: 29
    TimeInformation: "inherent"
             Labels: [29×4 table]
        Description: ""

 Use labelDefinitionsHierarchy to see a list of labels and sublabels.
 Use setLabelValue to add data to the set.

The labels you created are saved as a table to the Labels property.

labeledSet_142356.Labels
ans =

  29×4 table

                                                                                                                Content     SpeechDetected    SpeechContent        VUV    
                                                                                                                ________    ______________    _____________    ___________

    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Ambiance-16-44p1-mono-12secs.wav                ambience     { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\AudioArray-16-16-4channels-20secs.wav           ambience     { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\ChurchImpulseResponse-16-44p1-mono-5secs.wav    unknown      { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Click-16-44p1-mono-0.2secs.wav                  ambience     { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Counting-16-44p1-mono-15secs.wav                speech       {10×2 table}     {10×2 table}     {5×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Engine-16-44p1-stereo-20sec.wav                 ambience     { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\FemaleSpeech-16-8-mono-3secs.wav                speech       { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\FunkyDrums-44p1-stereo-25secs.mp3               music        { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\FunkyDrums-48-stereo-25secs.mp3                 music        { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Heli_16ch_ACN_SN3D.wav                          ambience     { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\JetAirplane-16-11p025-mono-16secs.wav           ambience     { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Laughter-16-8-mono-4secs.wav                    ambience     { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\MainStreetOne-24-96-stereo-63secs.wav           ambience     { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\NoisySpeech-16-22p5-mono-5secs.wav              speech       { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Rainbow-16-8-mono-114secs.wav                   speech       { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\RainbowNoisy-16-8-mono-114secs.wav              speech       { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\RandomOscThree-24-96-stereo-13secs.aif          music        { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\RockDrums-44p1-stereo-11secs.mp3                music        { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\RockDrums-48-stereo-11secs.mp3                  music        { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\RockGuitar-16-44p1-stereo-72secs.wav            music        { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\RockGuitar-16-96-stereo-72secs.flac             music        { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\SoftGuitar-44p1_mono-10mins.ogg                 music        { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\SpeechDFT-16-8-mono-5secs.wav                   speech       { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\TrainWhistle-16-44p1-mono-9secs.wav             ambience     { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Turbine-16-44p1-mono-22secs.wav                 ambience     { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\WashingMachine-16-44p1-stereo-10secs.wav        ambience     { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\WashingMachine-16-8-mono-1000secs.wav           ambience     { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\WashingMachine-16-8-mono-200secs.wav            ambience     { 0×2 table}     { 0×2 table}     {0×2 table}
    C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\WaveGuideLoopOne-24-96-stereo-10secs.aif        music        { 0×2 table}     { 0×2 table}     {0×2 table}

The file names associated with the labels are saved as a cell array to the Source property.

labeledSet_104620.Source
ans =

  29×1 cell array

    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Ambiance-16-44p1-mono-12secs.wav'            }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\AudioArray-16-16-4channels-20secs.wav'       }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\ChurchImpulseResponse-16-44p1-mono-5secs.wav'}
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Click-16-44p1-mono-0.2secs.wav'              }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Counting-16-44p1-mono-15secs.wav'            }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Engine-16-44p1-stereo-20sec.wav'             }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\FemaleSpeech-16-8-mono-3secs.wav'            }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\FunkyDrums-44p1-stereo-25secs.mp3'           }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\FunkyDrums-48-stereo-25secs.mp3'             }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Heli_16ch_ACN_SN3D.wav'                      }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\JetAirplane-16-11p025-mono-16secs.wav'       }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Laughter-16-8-mono-4secs.wav'                }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\MainStreetOne-24-96-stereo-63secs.wav'       }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\NoisySpeech-16-22p5-mono-5secs.wav'          }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Rainbow-16-8-mono-114secs.wav'               }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\RainbowNoisy-16-8-mono-114secs.wav'          }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\RandomOscThree-24-96-stereo-13secs.aif'      }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\RockDrums-44p1-stereo-11secs.mp3'            }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\RockDrums-48-stereo-11secs.mp3'              }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\RockGuitar-16-44p1-stereo-72secs.wav'        }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\RockGuitar-16-96-stereo-72secs.flac'         }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\SoftGuitar-44p1_mono-10mins.ogg'             }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\SpeechDFT-16-8-mono-5secs.wav'               }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\TrainWhistle-16-44p1-mono-9secs.wav'         }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\Turbine-16-44p1-mono-22secs.wav'             }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\WashingMachine-16-44p1-stereo-10secs.wav'    }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\WashingMachine-16-8-mono-1000secs.wav'       }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\WashingMachine-16-8-mono-200secs.wav'        }
    {'C:\Program Files\MATLAB\R2019b\toolbox\audio\samples\WaveGuideLoopOne-24-96-stereo-10secs.aif'    }

Prepare Audio Datastore for Deep Learning Workflow

To continue on to a deep learning or machine learning workflow, use audioDatastore. Using an audio datastore enables you to apply capabilities that are common to machine learning applications, such as splitEachLabel. splitEachLabel enables you split your data into train and test sets.

Create an audio datastore for your labeled signal set. Specify the location of the audio files as the first argument of audioDatastore and set the Labels property of audioDatastore to the Labels property of the labeled signal set.

ADS = audioDatastore(labeledSet_104620.Source,'Labels',labeledSet_104620.Labels)

ADS = 

  audioDatastore with properties:

                       Files: {
                              ' ...\toolbox\audio\samples\Ambiance-16-44p1-mono-12secs.wav';
                              ' ...\toolbox\audio\samples\AudioArray-16-16-4channels-20secs.wav';
                              ' ...\toolbox\audio\samples\ChurchImpulseResponse-16-44p1-mono-5secs.wav'
                               ... and 26 more
                              }
                      Labels: 29-by-4 table
    AlternateFileSystemRoots: {}
              OutputDataType: 'double'

Call countEachLabel and specify the Content table variable to count the number of files that are labeled as ambience, music, speech, or unknown.

countEachLabel(ADS,'TableVariable','Content')
ans =

  4×2 table

    Content     Count
    ________    _____

    ambience     13  
    music         9  
    speech        6  
    unknown       1  

For examples of using labeled audio data in a machine learning or deep learning workflow, see:

See Also

| | | |