Low LSTM Accuracy in Speech Recognition

Question

Hamza 2023년 10월 31일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2041026-low-lstm-accuracy-in-speech-recognition

댓글: Christopher McCausland 2023년 11월 6일

Hello everyone, I am applying LSTM to speech emotion recognition. I have performed feature extraction using MFCC, resulting in a matrix of dimensions 60,575 × 39. I subsequently transformed this matrix into a cell array named "AllCellTrain" with dimensions 280 × 1, containing signals of varying sizes, as illustrated in the image below. I then utilized "AllCellTrain" as input for the trainNetwork function, along with the labels YCA, network layers, and training options. However, I encountered a significant issue with accuracy, achieving only around 20%. I'm unsure where I may have made a mistake. Could someone please offer some assistance?

 num_hidden_units = 1024;
layers = [
    sequenceInputLayer(num_features)
    lstmLayer(num_hidden_units, 'OutputMode', 'last')
    fullyConnectedLayer(num_classes)
    softmaxLayer
    classificationLayer];
% Specify the training options
    max_epochs = 36;
    mini_batch_size = 28;
    initial_learning_rate = 0.001;
options = trainingOptions('adam', ...
    'MaxEpochs', max_epochs, ...
    'MiniBatchSize', mini_batch_size, ...
    'InitialLearnRate', initial_learning_rate, ...
    'SequenceLength','shortest', ...
    'Shuffle','every-epoch',...
    'ExecutionEnvironment','gpu', ...
    'Verbose', false, ...
    'Plots','training-progress');
net = trainNetwork(AllCellTrain, YCA, layers, options);
predicted_labels = classify(net, AllCellTest,'ExecutionEnvironment','gpu');
acc = mean(predicted_labels == YCT)

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

Hamza 2023년 11월 6일

편집: Hamza 2023년 11월 6일

Hi @Christopher McCausland , thanks for your answer, I ma trying to classify 7 emotion classes, for your information I have used the same data on 1D CNN and got 90% accuracy, didnt know the issue on LSTM, also when I shufflued the colunms "the features" I got diffrent result, which souldnt be the case. you find the attached curve! thanks in advance

Christopher McCausland 2023년 11월 6일

Hi @Hamza,

To me this looks like classic overfitting, your model appears to train well and learn features, however these features are overfitted to the training data, and are not representative of genralised data.

A few things to consider;

Do you have multiple speakers? If so, how do you pick which speakers are in the test/train set.
You have 280 input sequences, and seven classes, if the data is perfectly ballanced you have 40 observations per class, is this enough?
Can you include a validation split to prevent overfitting?
These are just a few ways to prevent overfitting/ ensure your data is appropreate for training, there are many other which I would suggest you take a look at.

In terms of the CNN preformance, were the test/train set the same and how many epochs did you train the CNN for?

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Low LSTM Accuracy in Speech Recognition

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Low LSTM Accuracy in Speech Recognition

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기