can anyone tell me how to remove unvoiced or silenced region from audio file?

Question

pranjal 2014년 12월 26일

1
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/168185-can-anyone-tell-me-how-to-remove-unvoiced-or-silenced-region-from-audio-file

댓글: krishna Chauhan 2022년 12월 14일

I have .wav file having voiced and unvoiced segments.I want to remove unvoiced part of it. so that processing time can be reduced.

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

huda farooqui 2018년 11월 1일

hey kindly tell me ...i want to reduce the gap between words but dont want to totally remove it...removing whole gap mix the words that we can not detect the end of each word

Image Analyst 2018년 11월 1일

Find the gaps but then when you stitch the segments together, insert a few elements of zeros.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Image Analyst 2014년 12월 26일

8
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/168185-can-anyone-tell-me-how-to-remove-unvoiced-or-silenced-region-from-audio-file#answer_163377

편집: Image Analyst 2019년 6월 16일

MATLAB Online에서 열기

See my program where I find where the envelope of the standard guitar demo file that ships with MATLAB is below 0.13 and I cut out those portions. Normal sine wave oscillation on a short time scale are retained so as to not alter the sound.

clc;    % Clear the command window.
close all;  % Close all figures (except those of imtool.)
clear;  % Erase all existing variables. Or clearvars if you want.
workspace;  % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 20;
% Read in data and plot it.
[y, Fs] = audioread('guitartune.wav');
subplot(2, 1, 1);
plot(y, 'b-');
grid on;
title('Original Signal + Envelope', 'FontSize', fontSize);
% Enlarge figure to full screen.
set(gcf, 'Units', 'Normalized', 'OuterPosition', [0 0 1 1]);
drawnow;
% Find the envelope by taking a moving max operation, imdilate.
envelope = imdilate(abs(y), true(1501, 1));
% Plot it.
hold on;
plot(envelope, 'r-', 'LineWidth', 2);
plot(-envelope, 'r-', 'LineWidth', 2);
legend('Data', 'Envelope');
% Save the x axis length so we can apply it to the edit plot
% so they are displayed on the same time frame
% So we can see how it got shorter.
xl = xlim();
% Find the quiet parts.
quietParts = envelope < 0.13; % Or whatever value you want.
% Cut out quiet parts and plot.
yEdited = y; % Initialize
yEdited(quietParts) = [];
subplot(2, 1, 2);
plot(yEdited, 'b-', 'LineWidth', 2);
title('Edited Signal, shorter because data was removed', 'FontSize', fontSize);
grid on;
% Make it plot over the same time range as the original
xlim(xl);

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

Image Analyst 2014년 12월 28일

MATLAB Online에서 열기

Try this code. It's a little different because I changed it to handle a stereo signal like your baby cry, and I adjusted the threshold. If it does the trick, please "Accept" the answer.

clc;    % Clear the command window.
close all;  % Close all figures (except those of imtool.)
clear;  % Erase all existing variables. Or clearvars if you want.
workspace;  % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 20;
% Read in data and plot it.
[y, Fs] = audioread('newborn cry.wav');
subplot(2, 1, 1);
plot(y, 'b-');
grid on;
caption = sprintf('Original Signal (%d elements) + Envelope', size(y, 1));
title(caption, 'FontSize', fontSize);
% Enlarge figure to full screen.
set(gcf, 'Units', 'Normalized', 'OuterPosition', [0 0 1 1]);
drawnow;
promptMessage = sprintf('Do you want to play the sound file,\nor Cancel to abort processing?');
titleBarCaption = 'Play Sound?';
button = questdlg(promptMessage, titleBarCaption, 'Yes', 'No', 'Cancel', 'Yes');
if strcmpi(button, 'Cancel')
  return;
elseif strcmpi(button, 'Yes')
  soundsc(y, Fs);
end
% Find the envelope by taking a moving max operation, imdilate.
meanOfBothChannels = mean(abs(y), 2);
envelope = imdilate(meanOfBothChannels, true(1501, 1));
% Plot it.
hold on;
plot(envelope, 'r-', 'LineWidth', 2);
plot(-envelope, 'r-', 'LineWidth', 2);
legend('Data', 'Envelope');
% Save the x axis length so we can apply it to the edit plot
% so they are displayed on the same time frame
% So we can see how it got shorter.
xl = xlim();
% Find the quiet parts.
quietParts = envelope < 0.05; % Or whatever value you want.
% Cut out quiet parts and plot.
yEdited = y; % Initialize
yEdited(quietParts, :) = []; % Erase quiet rows
subplot(2, 1, 2);
plot(yEdited, 'b-', 'LineWidth', 2);
caption = sprintf('Edited Signal (%d elements), shorter because data was removed', size(yEdited, 1));
title(caption, 'FontSize', fontSize);
grid on;
% Make it plot over the same time range as the original
xlim(xl);
promptMessage = sprintf('Do you want to play the edited sound file,\nor Cancel to abort processing?');
titleBarCaption = 'Play Sound?';
button = questdlg(promptMessage, titleBarCaption, 'Yes', 'No', 'Cancel', 'Yes');
if strcmpi(button, 'Cancel')
  return;
elseif strcmpi(button, 'Yes')
  soundsc(yEdited, Fs);
end

Alexandra 2016년 5월 13일

envelope = imdilate(meanOfBothChannels, true(1501, 1)); I don't understand how this value 1501 was chosen. If someone could explain to me. Thanks

Image Analyst 2016년 5월 14일

편집: Image Analyst 2016년 5월 14일

It was probably trial and error until a value was found that worked well. By the way, there is now an envelope() function in the Signal Processing Toolbox:

[yupper,ylower] = envelope(x) returns the upper and lower envelopes of the input sequence, x, as the magnitude of its analytic signal. The analytic signal of x is found using the discrete Fourier transform as implemented in hilbert. The function initially removes the mean of x and adds it back after computing the envelopes. If x is a matrix, then envelope operates independently over each column of x.

There is also a movmax() function that is like imdilate() in that it takes the max in a moving window.

댓글을 달려면 로그인하십시오.

Answer 2

Chad Greene 2014년 12월 26일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/168185-can-anyone-tell-me-how-to-remove-unvoiced-or-silenced-region-from-audio-file#answer_163368

MATLAB Online에서 열기

Hi Pranjal,

There are a number of ways to do this, but I cannot think of a perfect solution. The easiest way I can think of is to say the levels must exceed a certain threshold. But, you can't say each individual data point must exceed a certain threshold, you'll need to do some moving max or moving rms or moving spl. Here's a solution with moving max, using Aslak Grinsted's moving function.

First, load some data. I'm going to add a 12000 point stretch of low-volume noise in the middle:

load train
y = [y(1:4200);.01*randn(12000,1);y(4200:end)]; 
t = (0:length(y)-1)/Fs; % time

The sound signal looks and sounds like this:

plot(t,y,'blue')
box off; axis tight; hold on
xlabel('time (s)')
soundsc(y,Fs) % <- turn your speakers on

Now use moving to get the moving max centered over 35 data points and plot the moving maximum in red:

ymax = moving(y,35,@max);
plot(t,ymax,'red')

Now let's say we only want data whose moving max value exceeds 0.15. Create a clipped array corresponding to ymax>0.15

yclipped = y(ymax>0.15); 
tclipped = (0:length(yclipped)-1)/Fs; 
figure
plot(tclipped,yclipped,'b')
box off; axis tight
xlabel('time (s)')

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

Image Analyst 2022년 12월 14일

@krishna Chauhan then there would be no modulation of air pressure and no sound except for a faint hiss. The waveform produced by the microphone would then, during those time periods, look just like a low level white noise, just as it would if there was no talking going on. Granted maybe a little higher signal than complete silence, but that just means you may have to raise your threshold slightly to ignore cases when someone is doing that. The algorithms would just remain the same, just the threshold parameter to divide voiced/talking from unvoiced/silence would be slightly different. Are you implying that the algorithm to find the talking vs unvoiced/silent parts of the signal would be different?

krishna Chauhan 2022년 12월 14일

Please take it is as discussion as this question already has an accepted answer.

I read somewher, this two things are different and there should be different anaysis for both.

댓글을 달려면 로그인하십시오.

can anyone tell me how to remove unvoiced or silenced region from audio file?

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

채택된 답변

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

추가 답변 (1개)

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

can anyone tell me how to remove unvoiced or silenced region from audio file?

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

채택된 답변

댓글 수: 5 이전 댓글 3개 표시이전 댓글 3개 숨기기

추가 답변 (1개)

댓글 수: 5 이전 댓글 3개 표시이전 댓글 3개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기