How do I adapt the "Denoise Speech Using Deep Learning Networks" example to the TIMIT dataset?

Question

studentmatlaber 2022년 2월 20일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1654440-how-do-i-adapt-the-denoise-speech-using-deep-learning-networks-example-to-the-timit-dataset

댓글: Brian Hemmat 2022년 2월 25일

example= https://www.mathworks.com/help/deeplearning/ug/denoise-speech-using-deep-learningnetworks.html#DenoiseSpeechUsingDeepLearningNetworksExample-5

In the TIMIT dataset, the sounds are 16 kHz and I don't want to change that. I want to do this example with 16 kHz audio. In the example, I did not do the "Examine the Dataset" part for my own dataset. Later, I didn't write the "src" part in the "STFT Targets and Predictors" section, since I won't be making any conversions. However, the "Extract Features Using Tall Arrays" section is made using src. src is not defined in my code because I didn't do a conversion. How can I write the code without doing these conversions? I will be happy if you help.

Also, where is it set how many voices will be selected for training? For example, I want to use 25 random sounds from my dataset for training.

adsTrain = audioDatastore(fullfile('D:\','BİTİRME PROJESİ','TIMIT','data','TRAIN'),'IncludeSubfolders',true);
reduceDataset = true;
if reduceDataset
    adsTrain = shuffle(adsTrain);
    adsTrain = subset(adsTrain,1:1000);
end
[audio,adsTrainInfo] = read(adsTrain);
sound(audio,adsTrainInfo.SampleRate)
figure
t = (1/adsTrainInfo.SampleRate) * (0:numel(audio)-1);
plot(t,audio)
title("Example Speech Signal")
xlabel("Time (s)")
grid on

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Brian Hemmat 2022년 2월 23일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1654440-how-do-i-adapt-the-denoise-speech-using-deep-learning-networks-example-to-the-timit-dataset#answer_902790

MATLAB Online에서 열기

The suggestions below are with respect to the R2021b release. If that's not the release you're working from/these changes don't work for you, let me know.

Regarding not performing the sample rate conversion: Probably the simplest option is to modify the helper function as follows:

function [targets,predictors] = HelperGenerateSpeechDenoisingFeatures(audio,noise)%,src) <-CHANGE
% HelperGenerateSpeechDenoisingFeatures: Get target and predictor STFT
% signals for speech denoising.
% audio: Input audio signal
% noise: Input noise signal
% src:   Sample rate converter
% Copyright 2018 The MathWorks, Inc.
WindowLength = 512;%256; <-CHANGE
win          = hamming(WindowLength,'periodic');
Overlap      = round(0.75 * WindowLength);
FFTLength    = WindowLength;
NumFeatures  = FFTLength/2 + 1;
NumSegments  = 8;
% D            = 48/8; % Decimation factor  <-CHANGE
% L            = floor( numel(audio)/D);    <-CHANGE
% audio        = audio(1:D*L);              <-CHANGE
%                                           <-CHANGE
% audio = src(audio);                       <-CHANGE
% reset(src)                                <-CHANGE
randind      = randi(numel(noise) - numel(audio) , [1 1]);
noiseSegment = noise(randind : randind + numel(audio) - 1);
noisePower   = sum(noiseSegment.^2);
cleanPower   = sum(audio.^2);
noiseSegment = noiseSegment .* sqrt(cleanPower/noisePower);
noisyAudio   = audio + noiseSegment;
cleanSTFT = stft(audio, 'Window',win, 'OverlapLength', Overlap, 'FFTLength',FFTLength);
cleanSTFT = abs(cleanSTFT(NumFeatures-1:end,:));
noisySTFT = stft(noisyAudio, 'Window',win, 'OverlapLength', Overlap, 'FFTLength',FFTLength);
noisySTFT = abs(noisySTFT(NumFeatures-1:end,:));
noisySTFTAugmented = [noisySTFT(:,1:NumSegments-1) noisySTFT];
 
STFTSegments = zeros(NumFeatures, NumSegments , size(noisySTFTAugmented,2) - NumSegments + 1);
for index = 1 : size(noisySTFTAugmented,2) - NumSegments + 1
    STFTSegments(:,:,index) = noisySTFTAugmented(:,index:index+NumSegments-1);
end
targets    = cleanSTFT;
predictors = STFTSegments;

Then remove the src argument from the call to the helper function in the live script (line 86)

% [targets,predictors] = cellfun(@(x)HelperGenerateSpeechDenoisingFeatures(x,noise,src),T,"UniformOutput",false);
[targets,predictors] = cellfun(@(x)HelperGenerateSpeechDenoisingFeatures(x,noise),T,"UniformOutput",false);

You will need to modify the definition of the window length in the script as well:

windowLength = 512;%256;

Note that the washing machine noise is sampled at 8 kHz. So you will either need to resample it to 16 kHz, or find a noise signal that has recorded at 16 kHz. Upsampling the signal would not be a realistic scenario.

Regarding this question:

Also, where is it set how many voices will be selected for training? For example, I want to use 25 random sounds from my dataset for training.

Do you mean you 25 speech audio files from your dataset? You can use subset to reduce your dataset to the required number of files. For example, the below code will pick 25 random files from the dataset as your training files.

if reduceDataset
    adsTrain = shuffle(adsTrain);
%     adsTrain = subset(adsTrain,1:1000);
    adsTrain = subset(adsTrain,1:25);
end

HTH,

Brian

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

studentmatlaber 2022년 2월 25일

MATLAB Online에서 열기

% D            = 48/8; % Decimation factor  <-CHANGE
% L            = floor( numel(audio)/D);    <-CHANGE
% audio        = audio(1:D*L);              <-CHANGE
%                                           <-CHANGE
% audio = src(audio);                       <-CHANGE
% reset(src)                                <-CHANGE

I don't understand how to make a change here

Brian Hemmat 2022년 2월 25일

When I write <-CHANGE, that's just pointing out which lines were changed, not any further action for you to take.

The suggestion is to just comment out those lines, since they are not relevant after we changed the HelperGenerateSpeechDenoisingFeatures function to not apply any sample rate conversion. You could also just delete those lines.

댓글을 달려면 로그인하십시오.

How do I adapt the "Denoise Speech Using Deep Learning Networks" example to the TIMIT dataset?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

How do I adapt the "Denoise Speech Using Deep Learning Networks" example to the TIMIT dataset?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기