Neural networks stuttering detection using MFCC features

조회 수: 7 (최근 30일)
Rahaf Alenezi
Rahaf Alenezi 2019년 4월 6일
답변: jibrahim 2019년 4월 22일
Hello,
I am working on a project were I need to detect stuttering sounds in speech using a pattern recognition neural network. The network is working fine detecting non stuttering sounds. However, I have 777 samples of stuttered sounds and I trained them using 13 mfcc features and I'm still not detecting stuttering.
I think I might have been extracting features wrong but I followed the documentations of using mfcc features.
The neural network result shows me that it's not working very well on stuttering detection. This is my code for feature extraction and I'm using pattern net from nnstart. Could you please tell me what I'm I doing wrong? Thank you in advance.
files = dir('data1s/*.wav');
unstutteredFeatures = [];
stutteredFeatures = [];
targetVector = zeros(2, numel(files));
for i = 1 : numel(files)
fileName = strcat('data1s/',files(i).name);
newSample = [];
[x,fs] = audioread(fileName);
[y] = mfcc(x,fs,'LogEnergy','Replace');
if fileName(8) == '0' %data were previously labled
newSample = vertcat(newSample, mean(y));
unstutteredFeatures = vertcat(unstutteredFeatures, newSample);
targetVector(1, i) = 1;
else
newSample = vertcat(newSample, mean(y));
stutteredFeatures = vertcat(stutteredFeatures, newSample);
targetVector(2, i) = 1;
end
end
inputMatrix = vertcat(unstutteredFeatures, stutteredFeatures);
targetVector=targetVector';

답변 (1개)

jibrahim
jibrahim 2019년 4월 22일
Hi Rahaf,
The mfcc function will return an L-by-13 matrix, where L is the number of frames the audio signal is partitioned into, and 13 is the number of coefficients. In your code, you are taking the mean of the mfcc output, so you end up with a 1-by-13 vector for each file, regadless of how long your file is. This might not be sufficient data to make a decision on whether stuttering occured.
You problem setup is somewhat similar to a gender classification example in the product:
The example uses gtcc, but you can easily use mfcc instead (and drop the other features from the example if they do not help). Rather than reducing coefficients to their means, the entire sequence of coefficients is fed to a deep learning network.
HTH,
Jihad

카테고리

Help CenterFile Exchange에서 Sequence and Numeric Feature Data Workflows에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by