Not sure if I set up this neural network correctly

Question

Saketh Medicherla 2020년 12월 25일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/702077-not-sure-if-i-set-up-this-neural-network-correctly

편집: Brian Hemmat 2021년 1월 6일

Below is my code as well as the information about the variables for a basic audio classification problem, which is reading an audio file and distinguishing whether the signal is a car horn or a dog barking. I followed the same format as this tutorial I found: https://www.mathworks.com/help/audio/gs/classify-sound-using-deep-learning.html.

I'm not sure where I went wrong, but when training the program did not plot the loss value. And when I tried to test a sample file, the result was "<undefined>". I would appreciate any help on this.

% --------------------------------------------------------------
% Loading Training and Evaluation Sets for Car Horn and Dog Bark
% --------------------------------------------------------------
carDataStore = UrbanSound8K(UrbanSound8K.class == "car_horn",:);
carDataStore = carDataStore(carDataStore.salience == 1,:);
dogDataStore = UrbanSound8K(UrbanSound8K.class == "dog_bark",:);
dogDataStore = dogDataStore(dogDataStore.salience == 1,:);
carData = [];
dogData = [];
% Add first 2 seconds of each audiofile to their respective matrices and
% produce labels
for i = 1:height(carDataStore)
    thisfile = "UrbanSound8K\audio\fold" + string(carDataStore(i,:).fold) + "\" + string(carDataStore(i,:).slice_file_name);
    if audioinfo(thisfile).Duration >= 2 && audioinfo(thisfile).SampleRate == 44100
        [y,fs] = audioread(thisfile);
        samples = [1,2*fs];
        clear y fs;
        [y,fs] = audioread(thisfile, samples);
        carData = [carData,y(:,1)];
    end
end
carLabels = repelem(categorical("car horn"),width(carData),1);
for i = 1:height(dogDataStore)
    thisfile = "UrbanSound8K\audio\fold" + string(dogDataStore(i,:).fold) + "\" + string(dogDataStore(i,:).slice_file_name);
    if audioinfo(thisfile).Duration >= 2 && audioinfo(thisfile).SampleRate == 44100
        [y,fs] = audioread(thisfile);
        samples = [1,2*fs];
        clear y fs;
        [y,fs] = audioread(thisfile, samples);
        dogData = [dogData,y(:,1)];
    end
end
dogLabels = repelem(categorical("dog barking"),width(dogData),1);
dogVals = round(0.8*width(dogData));
carVals = round(0.8*width(carData));
audioTrain = [dogData(:,1:dogVals),carData(:,1:carVals)];
labelsTrain = [dogLabels(1:dogVals);carLabels(1:carVals)];
audioValidation = [dogData(:,(dogVals + 1):end),carData(:,(carVals + 1):end)];
labelsValidation = [dogLabels((dogVals + 1):end);carLabels((carVals + 1):end)];
% ---------------------------------------------------------
% Audio Feature Extractor to reduce dimensionality of audio,
% Extracting slope and centroid of mel spectrum over time
% ---------------------------------------------------------
aFE = audioFeatureExtractor("SampleRate",fs, ...
    "SpectralDescriptorInput","melSpectrum", ...
    "spectralCentroid",true, ...
    "spectralSlope",true);
featuresTrain = extract(aFE,audioTrain);
[numHopsPerSequence,numFeatures,numSignals] = size(featuresTrain);
featuresTrain = permute(featuresTrain,[2,1,3]);
featuresTrain = squeeze(num2cell(featuresTrain,[1,2]));
numSignals = numel(featuresTrain);
[numFeatures,numHopsPerSequence] = size(featuresTrain{1});
featuresValidation = extract(aFE,audioValidation);
featuresValidation = permute(featuresValidation,[2,1,3]);
featuresValidation = squeeze(num2cell(featuresValidation,[1,2]));
% ----------------------------------------
% Defining the Neural Network Architecture
% ----------------------------------------
layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(50,"OutputMode","last")
    fullyConnectedLayer(numel(unique(labelsTrain)))
    softmaxLayer
    classificationLayer];
options = trainingOptions("adam", ...
    "Shuffle","every-epoch", ...
    "ValidationData",{featuresValidation,labelsValidation}, ...
    "Plots","training-progress", ...
    "Verbose",false);
net = trainNetwork(featuresTrain,labelsTrain,layers,options);

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Brian Hemmat 2020년 12월 28일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/702077-not-sure-if-i-set-up-this-neural-network-correctly#answer_585832

편집: Brian Hemmat 2020년 12월 28일

MATLAB Online에서 열기

Hi Saketh,

I believe the example you're following is more of a 'hello-world' type example--your current code is trying to accomplish something more difficult. You'll probably need to extract features with more information, and depending on your end goal, also apply standardization.

Regarding your particular questions and why the network is not working, its difficult to say without being able to walk through your code (which would require access to that dataset which I don't have).

Below, I've written something that is similar to your code but using the ESC-10 dataset, which can be downloaded from mathworks support files. Hopefully reading through it will help with your current problem.

I changed the features extracted to mfcc the delta and delta-delta mfcc. The dataset does not have car sounds, so we're doing "dog" and "helicopter" instead. Instead of doing any trimming of the signal, we pass in cell arrays of features and tell the network how to trim the signals if they're not the same size. The amount of training and validation data is tiny, so we'll reduce the validation frequency to make sure validation data is plotted (this might be a similar issue to why you're not seeing loss).

% Download dataset
url = 'https://ssd.mathworks.com/supportfiles/audio/ESC-10.zip';
outputLocation = tempdir;
unzip(url,outputLocation)
% Create audioDatastore to point to dataset. Use the folder names as the
% labels.
esc10Datastore = audioDatastore(fullfile(outputLocation,'ESC-10'), ...
    'IncludeSubfolders',true,'LabelSource','foldernames');
% Subset to only include 'dog' and 'helicopter' labels.
ads = subset(esc10Datastore,esc10Datastore.Labels==categorical("dog") | ...
    esc10Datastore.Labels==categorical("helicopter"));
% Split the datastore into train and validation sets.
[adsTrain,adsValidation] = splitEachLabel(ads,0.8);
% Read a single signal from the train datastore and listen to it.
[audioIn,audioInfo] = read(adsTrain);
fs = audioInfo.SampleRate;
sound(audioIn,fs)
% Create an audioFeatureExtractor
aFE = audioFeatureExtractor("SampleRate",fs, ...
    "mfcc",true, ...
    "mfccDelta",true, ...
    "mfccDeltaDelta",true);
% Get the number of features output per signal
features = extract(aFE,audioIn);
[numHops,numFeatures] = size(features);
% Read all audio data into memory
dataTrain = readall(adsTrain);
labelsTrain = removecats(adsTrain.Labels); %remove empty categories
dataValidation = readall(adsValidation);
labelsValidation = removecats(adsValidation.Labels);
% Extract features from all the data (assume the entire dataset uses the same sample rate (44.1 kHz).
featuresTrain = cellfun(@(x)(extract(aFE,x))',dataTrain,'UniformOutput',false);
featuresValidation = cellfun(@(x)(extract(aFE,x))',dataValidation,'UniformOutput',false);
% Define the architecture
layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(100,"OutputMode","last") %< increased number of hidden units
    fullyConnectedLayer(numel(unique(labelsTrain)))
    softmaxLayer
    classificationLayer];
% Define the training options
options = trainingOptions("adam", ...
    "Shuffle","every-epoch", ...
    "ValidationData",{featuresValidation,labelsValidation}, ...
    "Plots","training-progress", ...
    "Verbose",false, ...
    "SequenceLength","shortest", ...%<--Specify the sequence length (try experimenting with different options)
    "ValidationFrequency",20);
% Train the network
net = trainNetwork(featuresTrain,labelsTrain,layers,options);

% Evaluate performance on the validation set
y = classify(net,featuresValidation);
accuracy = mean(y==labelsValidation);
cm = confusionchart(labelsValidation,y);
cm.Title = sprintf('Confusion Matrix for Validation Data (Accuracy = %0.2f)',accuracy);
cm.ColumnSummary = 'column-normalized';
cm.RowSummary = 'row-normalized';

댓글 수: 2
없음 표시없음 숨기기

Saketh Medicherla 2021년 1월 5일

Thank you for your answer! I have another quick question: Is it necessary to have the same number of audio files for each category to achieve as high an accuracy as possible? I've tested the approach you have provided above with the database I am using (UrbanSound8K), and I'm seeing results of around 75-80% accuracy. I'm assuming this is due to the discrepancy of the available files (645 dog barking, 153 car horn), but I am not completely sure and would appreciate your input.

Brian Hemmat 2021년 1월 5일

편집: Brian Hemmat 2021년 1월 6일

Hi Saketh,

You'll generally receive the best results if you train using a balanced class distribution. But that's just one of many contributing factors to accuracy.

One approach to dealing with unbalanced class distributions is to use a weighted classification layer. Speech Command Recognition Using Deep Learning uses a weighted classification layer. It's a custom layer and a bit of an advanced maneuver. Also, the example uses a CNN, and I'm not positive a weighted classification layer will improve performance on an LSTM network as well.

Another approach would be to augment your dataset using audioDataAugmenter.

Another approach is to use a pretrained network. You could use something like classifySound off-the-shelf, or you could use the underlying YAMNet network and perform transfer learning for your specific task, as in this example: Transfer Learning Using YAMNet.

One other thing to keep in mind: In the code example I provided previously, I created the validation set as a percentage (20%) of the entire data set. This assumed that that the classes are roughly balanced. Usually, if you have unbalanced classes for training, you'll still want balanced classes for validation/testing to get a fair assessment (although this depends on your final application and desired performance). You can use splitEachLabel and specify the number of files to create balanced validation or test sets: Split by Number of Files.

Good luck!

댓글을 달려면 로그인하십시오.

Answer 2

Anshika Chaurasia 2020년 12월 29일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/702077-not-sure-if-i-set-up-this-neural-network-correctly#answer_586142

Hi Saketh,

You can also refer to Classify Urban Sound using Machine Learning & Deep Learning file containing a script to classify Urban Sound 8K dataset using Wavelet Analysis and Deep Learning.

Note: Classify Urban Sound using Machine Learning & Deep Learning is one of the several submissions in MATLAB File Exchange on MATLAB Central which is a forum for our product users to interact, exchange information and knowledge, without MathWorks' involvement. Feel free to contact the author of this submission directly for specific questions about the implementation.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Not sure if I set up this neural network correctly

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

Not sure if I set up this neural network correctly

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기