Low LSTM Accuracy in Speech Recognition

조회 수: 3 (최근 30일)
Hamza
Hamza 2023년 10월 31일
댓글: Christopher McCausland 2023년 11월 6일
Hello everyone, I am applying LSTM to speech emotion recognition. I have performed feature extraction using MFCC, resulting in a matrix of dimensions 60,575 × 39. I subsequently transformed this matrix into a cell array named "AllCellTrain" with dimensions 280 × 1, containing signals of varying sizes, as illustrated in the image below. I then utilized "AllCellTrain" as input for the trainNetwork function, along with the labels YCA, network layers, and training options. However, I encountered a significant issue with accuracy, achieving only around 20%. I'm unsure where I may have made a mistake. Could someone please offer some assistance?
num_hidden_units = 1024;
layers = [
sequenceInputLayer(num_features)
lstmLayer(num_hidden_units, 'OutputMode', 'last')
fullyConnectedLayer(num_classes)
softmaxLayer
classificationLayer];
% Specify the training options
max_epochs = 36;
mini_batch_size = 28;
initial_learning_rate = 0.001;
options = trainingOptions('adam', ...
'MaxEpochs', max_epochs, ...
'MiniBatchSize', mini_batch_size, ...
'InitialLearnRate', initial_learning_rate, ...
'SequenceLength','shortest', ...
'Shuffle','every-epoch',...
'ExecutionEnvironment','gpu', ...
'Verbose', false, ...
'Plots','training-progress');
net = trainNetwork(AllCellTrain, YCA, layers, options);
predicted_labels = classify(net, AllCellTest,'ExecutionEnvironment','gpu');
acc = mean(predicted_labels == YCT)
  댓글 수: 4
Hamza
Hamza 2023년 11월 6일
편집: Hamza 2023년 11월 6일
Hi @Christopher McCausland , thanks for your answer, I ma trying to classify 7 emotion classes, for your information I have used the same data on 1D CNN and got 90% accuracy, didnt know the issue on LSTM, also when I shufflued the colunms "the features" I got diffrent result, which souldnt be the case. you find the attached curve! thanks in advance
Christopher McCausland
Christopher McCausland 2023년 11월 6일
Hi @Hamza,
To me this looks like classic overfitting, your model appears to train well and learn features, however these features are overfitted to the training data, and are not representative of genralised data.
A few things to consider;
  1. Do you have multiple speakers? If so, how do you pick which speakers are in the test/train set.
  2. You have 280 input sequences, and seven classes, if the data is perfectly ballanced you have 40 observations per class, is this enough?
  3. Can you include a validation split to prevent overfitting?
  4. These are just a few ways to prevent overfitting/ ensure your data is appropreate for training, there are many other which I would suggest you take a look at.
In terms of the CNN preformance, were the test/train set the same and how many epochs did you train the CNN for?

댓글을 달려면 로그인하십시오.

답변 (0개)

카테고리

Help CenterFile Exchange에서 Speech Recognition에 대해 자세히 알아보기

제품


릴리스

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by