how to handle huge volume of audio features

Question

hema purad 2022년 9월 20일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1808710-how-to-handle-huge-volume-of-audio-features

댓글: hema purad 2022년 9월 22일

채택된 답변: Gabriele Bunkheila

MATLAB Online에서 열기

I have to find the cartesian product of audio features. I have selected MFCC, BARKSPECTRUM, PITCH as featues.

and I have 10 classes of 1000 samples in my audio database.

I thought MFCC and BARKSP and PITCH produces each with single line of N, M, Q size features but. these each are generating matrix of data with sizeof 970X14 for MFCC, 970X32 for barksp and 970X1 for pitch ... I thought these 970 rows are for 970 training audio samples . Its not so... and thses features size is changing ... please see my code and tell me how to handle this huge volume of audio data and i need to find the cartesian product of these features and I should get 1 row for 1 audio sample with each feature

clc
clear all
clear variables
close all
%------------------------------------
%Create an audioDatastore
db = fullfile('C:/DBs/AudioDB') 
db = 'C:/DBs/AudioDB'
audiodb = audioDatastore(db,'IncludeSubfolders',true,'LabelSource','foldernames');
Cannot find files or folders matching: 'C:/DBs/AudioDB'.
%------------------------------------
[sample, dsInfo] = read(audiodb);
reset(audiodb)
fs = dsInfo.SampleRate;
windowLength = round(0.03*fs);
overlapLength = round(0.025*fs);
%--------------------------------
allfeatures=[];
labels_allfeatures=[];
%------------------------------------
while hasdata(audiodb)
    [audioIn,adsInfo] = read(audiodb);
       %------MFCC FEATURES---------------
    MFCC1 = mfcc(audioIn,fs,'Window',hamming(windowLength,'periodic'),'OverlapLength',overlapLength);
   
%--------------BARK FEATURES------------------------------
    afe = audioFeatureExtractor('SampleRate',fs, ...
        'Window',hamming(windowLength,'periodic'), ...
        'OverlapLength',overlapLength, ...
        'barkSpectrum',true);
    %setExtractorParams(afe,"barkSpectrum","NumBands",64)
    BSP1=extract(afe,audioIn);
    %save barSp.mat barSp
%------PITCH FEATURES-----------------------------
    P1 = pitch(audioIn,fs,'WindowLength',windowLength,'OverlapLength',overlapLength);
    %save pit.mat pit
feat=[MFCC1,BSP1,P1];
%------------------------------------------------
    voicedSpeech = isVoicedSpeech(audioIn,fs,windowLength,overlapLength);
  feat(~voicedSpeech,:) = [];
   label = repelem(adsInfo.Label,size(feat,1));
%-------------------------------------------------  
   allfeatures = [allfeatures;feat];
   labels_allfeatures = [labels_allfeatures,label];
end %end of while
save allfeatures.mat  allfeatures
save labels_allfeatures.mat labels_allfeatures
%---------------------------------
%%
%Supporting Functions
function voicedSpeech = isVoicedSpeech(x,fs,windowLength,overlapLength)
pwrThreshold = -40;
[segments,~] = buffer(x,windowLength,overlapLength,'nodelay');
pwr = pow2db(var(segments));
isSpeech = (pwr > pwrThreshold);
zcrThreshold = 1000;
zeroLoc = (x==0);
crossedZero = logical([0;diff(sign(x))]);
crossedZero(zeroLoc) = false;
[crossedZeroBuffered,~] = buffer(crossedZero,windowLength,overlapLength,'nodelay');
zcr = (sum(crossedZeroBuffered,1)*fs)/(2*windowLength);
isVoiced = (zcr < zcrThreshold);
voicedSpeech = isSpeech & isVoiced;
end

. plz help me.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Gabriele Bunkheila 2022년 9월 21일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1808710-how-to-handle-huge-volume-of-audio-features#answer_1057645

편집: Gabriele Bunkheila 2022년 9월 21일

MATLAB Online에서 열기

Hi Hema, the general way feature extraction works for signals is that a set of features is extracted for every buffer of samples. The longer the file, the higher the number of buffers fitting across its whole length. This is why you are seeing a variable number of feature row for each audio file - because each audio file has different number of audio samples.

audioFeatureExtractor by default uses hamming windows of 1024 samples (around 23ms at 44.1kHz), overlapping by 512 samples. You can get it to use whatever window length, though usually the length is set based on the type of signal and which type of signal characteristic you think will be most discriminative for your problem.

If you are really looking for only a single row a features per file, then you may want to apply some kind of averaging netric (e.g. mean, median, ...) across the rows of the feat array. In the code above that would happen after the use of isVoicedSpeech, if relevant. For example:

% [...]
feat(~voicedSpeech,:) = [];
feat = median(feat,1);
label = adsInfo.Label; % (no more need to size-extend file label)
% [...]

I hope this helps.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

hema purad 2022년 9월 22일

Thank you so much Gabriele .... I have calculted the mean of each feature Type so that it is able to fit in single row and further calculation was easy. Thank You so much for your Answer. and its working.

댓글을 달려면 로그인하십시오.

how to handle huge volume of audio features

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

how to handle huge volume of audio features

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기