i want to use LSTM based audio network to work with Live audio

Question

Arslan Munim 2022년 7월 27일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1768630-i-want-to-use-lstm-based-audio-network-to-work-with-live-audio

댓글: Arslan Munim 2022년 9월 28일

Hello Matlab team,

I am using this example to work with my audio data set https://www.mathworks.com/matlabcentral/fileexchange/74611-fault-detection-using-deep-learning-classification#examples_tab dataset is trained but I want to make the application live with PC, forexample I have a mic and make an application to use my trained model to predict the output.

Can you guide me or help me with that?

Regards,

Arslan Munaim

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

jibrahim 2022년 7월 27일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1768630-i-want-to-use-lstm-based-audio-network-to-work-with-live-audio#answer_1016040

MATLAB Online에서 열기

Hi Arslan,

There is a function in that repo (streamingClassifier) that should get the job done in conjunction with an audio device reader:

% Create a microphone object
adr = audioDeviceReader(SampleRate=16e3,SamplesPerFrame=512);
% These statistic value should come from your training...
M = 0;
S = 1;
while 1
    % Read a frame of data from microphone
    frame = adr();
    % Pass to network
    scores = streamingClassifier(frame,M,S);
    % Use the scores any way you want
end

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

Arslan Munim 2022년 7월 28일

편집: Arslan Munim 2022년 7월 28일

Hi jibrahim,

Thanks for your reply, I tried using streamingClassifier. however I am trying to use extract function instead of extractFeatures function (because of dependenices issues) however with extract function I can only use one feature at a time. however I trained network with 11 features.

Can you please how i can use extract function in streamingClassifier? I am attaching code for your reference:

windowLength = 512;

overlapLength = 0;

aFE = audioFeatureExtractor('SampleRate',44100, ...

'Window',hamming(windowLength,'periodic'),...

'OverlapLength',overlapLength,...

'spectralCentroid',true, ...

'spectralCrest',true,...

'spectralDecrease',true, ...

'spectralEntropy',true,...

'spectralFlatness',true,...

'spectralFlux',true,...

'spectralKurtosis',true,...

'spectralRolloffPoint',true,...

'spectralSkewness',true,...

'spectralSlope',true,...

'spectralSpread',true);

features = extract(aFE , audioIn)

%%%%%%%%%features = extractFeatures(audioIn);

% Normalize

features = ((features - M')./S');

[net, scores] = predictAndUpdateState(net,features);

jibrahim 2022년 7월 28일

MATLAB Online에서 열기

Hi Arslan,

The extract function should also return 11 features. For example, if you replace the eixsting function extractFeatures with this modified function, things should work the same:

function featureVector = extractFeatures2(x)
%#codegen
persistent afe
if isempty(afe)
    windowLength = 512;
    overlapLength = 0;
    afe = audioFeatureExtractor('SampleRate',44100, ...
        'Window',hamming(windowLength,'periodic'),...
        'OverlapLength',overlapLength,...
        'spectralCentroid',true, ...
        'spectralCrest',true,...
        'spectralDecrease',true, ...
        'spectralEntropy',true,...
        'spectralFlatness',true,...
        'spectralFlux',true,...
        'spectralKurtosis',true,...
        'spectralRolloffPoint',true,...
        'spectralSkewness',true,...
        'spectralSlope',true,...
        'spectralSpread',true);
end
featureVector = extract(afe,x);
end

The size of featureVector will be 1-by-11, each element in the vector representing one of your features.

Notice I declared afe as persistent. This is to ensure the audio feature extractor is not recreated every time you call this function in your loop. the extractor goes through some one-time setup computations when you first call it. No need to waste time repeating those.

jibrahim 2022년 8월 2일

MATLAB Online에서 열기

Hi Arslan,

Since you trained the network with a sample rate of 16e3, you will have to perform sample-rate conversion from 44100 kHz to 16 kHz. This code is a possible implementation, where you essentially feed the network frames of length 512 sampled at 16 kHz, just like the original code:

% Create a microphone object
%adr = audioDeviceReader(SampleRate=16e3,SamplesPerFrame=512);
src = dsp.SampleRateConverter(InputSampleRate=44100,OutputSampleRate=16e3,...
                              Bandwidth=15800);
[~,D] = src.getRateChangeFactors;
% The frame size must be a multiple of 441 (the decimation factor of the
% sample rate converter)
L = floor(22000/D);
frameLength = L*D; % get as close to desired frame size
adr = audioDeviceReader(SampleRate=44100,SamplesPerFrame=frameLength);
buff = dsp.AsyncBuffer;
% These statistic values should come from your training...
M = 0;
S = 1;
while 1
    % Read a frame of data from microphone
    frame = adr();
    % Convert to 16 KHz
    frame = src(frame); 
    % Save to buffer
    write(buff,frame)
    while buff.NumUnreadSamples >= 512
        frame = read(buff,512);
        % Pass to network
        scores = streamingClassifier(frame,M,S);
        % Use the scores any way you want
    end
end

Note that you can also potentially feed the network longer frames. That should also work, and is probably more efficient, as the network will run faster if you give it a long input (as opposed to multiple short ones):

% Create a microphone object
%adr = audioDeviceReader(SampleRate=16e3,SamplesPerFrame=512);
src = dsp.SampleRateConverter(InputSampleRate=44100,OutputSampleRate=16e3,Bandwidth=15800);
[~,D] = src.getRateChangeFactors;
% The frame size must be a multiple of 441 (the decimation factor of the
% sample rate converter)
L = floor(22000/D);
frameLength = L*D;
adr = audioDeviceReader(SampleRate=44100,SamplesPerFrame=frameLength);
buff = dsp.AsyncBuffer;
% These statistic values should come from your training...
M = 0;
S = 1;
while 1
    % Read a frame of data from microphone
    frame = adr();
    % Convert to 16 KHz
    frame = src(frame); 
    % Save to buffer
    write(buff,frame)
    N = buff.NumUnreadSamples;
    L = floor(N/512);
    if L>0
        frame = read(buff,512*L);
        % Pass to network
        scores = streamingClassifier(frame,M,S);
        % Use the scores any way you want
    end
end

If you can't change the frame size on the microphone, then you can handle that using another buffer, for example:

% Create a microphone object
%adr = audioDeviceReader(SampleRate=16e3,SamplesPerFrame=512);
src = dsp.SampleRateConverter(InputSampleRate=44100,OutputSampleRate=16e3,Bandwidth=15800);
[~,D] = src.getRateChangeFactors;
% The frame size must be a multiple of 441 (the decimation factor of the
% sample rate converter)
L = floor(22000/D);
frameLength = L*D;
adr = audioDeviceReader(SampleRate=44100,SamplesPerFrame=22000);
buffSRC = dsp.AsyncBuffer;
buff = dsp.AsyncBuffer;
% These statistic values should come from your training...
M = 0;
S = 1;
while 1
    % Read a frame of data from microphone
    frame = adr();
    write(buffSRC,frame);
    frame = read(buffSRC,frameLength);
    % Convert to 16 KHz
    frame = src(frame); 
    % Save to buffer
    write(buff,frame)
    N = buff.NumUnreadSamples;
    L = floor(N/512);
    if L>0
        frame = read(buff,512*L);
        % Pass to network
        scores = streamingClassifier(frame,M,S);
        % Use the scores any way you want
    end
end

Arslan Munim 2022년 8월 9일

Hi jibrahim,

Thankyou for your support, it was very helpful.

Now I want to use multiple mics for prediction can you please give me some idea how i can use streaming classifier with 3 or 4 mics of the predicition.

Thanks and have a nice day.

Regards,

Arslan

댓글을 달려면 로그인하십시오.

Answer 2

jibrahim 2022년 8월 9일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1768630-i-want-to-use-lstm-based-audio-network-to-work-with-live-audio#answer_1023635

Hi Arslan,

audioDeviceReader supports multi-mic devices. Use the ChannelMappingSource and ChannelMapping properties to map between device input channels and the output data.

This network was trained on mono data, so, to adapt it to multi-channel data, you either have to retrain your network for multi-channel data, or somehow combine your input channels into one channel (by a weighted sum, or selecting a particular channel, etc) and proceed like above.

댓글 수: 23
이전 댓글 21개 표시이전 댓글 21개 숨기기

Arslan Munim 2022년 8월 17일

편집: Walter Roberson 2022년 8월 19일

MATLAB Online에서 열기

Hi jibrahim,

I try to read data from multiple mic but it is giving me this error everytime i try to use multiple mic, I am trying to read frame from each Microphone and send that data to streaming classifier to predict the output but it giving me error always on frame1 = adr1()

Error using audioDeviceReader/setup

A given audio device may only be opened once.

Error in audioDeviceReader/setupImpl

Error in multipleMic (line 10)

frame1 = adr1() - Show complete stack trace

adr1 = audioDeviceReader(SampleRate=44.1e3,SamplesPerFrame=22000, Device="Microphone (4- USB PnP Sound Device)",BitDepth="16-bit integer");
adr2 = audioDeviceReader(SampleRate=44.1e3,SamplesPerFrame=22000, Device="Microphone (USB PnP Sound Device)",BitDepth="16-bit integer");
% These statistic value should come from your training...
% M = 0;
% S = 1;
while 1
    % Read a frame of data from microphone
    frame1 = adr1()
    frame2 = adr2()  
    % Pass to network
    [class] = streamingClassifier2(frame1,frame2,M,S)
    % Use the scores any way you want
end
function [class] = streamingClassifier2(frame1,frame2,M,S)
% This is a streaming classifier function 
persistent net; 
if isempty(net)
    net = coder.loadDeepLearningNetwork('net.mat');
end
% Extract features using function
%features = extract(aFE , audioIn)
features1 = extractFeatures2(frame1);
features2 = extractFeatures2(frame2);
% Normalize 
features1 = ((features1 - M)./S).';
features2 = ((features2 - M)./S).';
% Classify
[class] = classify(net,{features1,features2});
%[net, scores] = classify(net,feature)
end

jibrahim 2022년 8월 19일

MATLAB Online에서 열기

Arslan, we support the scenario with one USB card with several mics hooked to it. You can't use audioDeviceReader to read from separate cards at the same time. Even if we did, since these different mics run on different clocks, I am not sure how you would achieve synchronization between them anyway.

One possible workaround is to use a different MATLAB session to read from the other microphone, and send the data to MATLAB via UDP. So, in another MATLAB, run some code like this:

sender = dsp.UDPSender(RemoteIPPort=25000);
src = audioDeviceReader;
while(1)
    frame = src();
    sender(frame);
end

Them, in the main MATLAB, you can receive the audio:

rec = dsp.UDPReceiver(LocalIPPort=25000);
scope = timescope;
while(1)
    frame = rec();
    scope(frame);
end

This might work if your sound is in steady state and does not change often/fast. If synchronization between mics becomes an issue, then I think one card with multiple devices associated with it is definitely the way to go.

jibrahim 2022년 8월 20일

OK, this helps. You will need other hardware (one device, multiple mics) for the system to recognize it. You could also give the UDP idea a shot, see how viable that is.

Arslan Munim 2022년 9월 28일

Hi again,

I am trying to train my network, with lowering BitsPerSample to 8 before it was 16 BitsPerSample. Every time i try to start training model it throw warning (given below) and terminates.

I try it with different sample rate but it gives same error everytime. I tried to change my layer structure, changing InitialLearnRate',0.001 but still i am getting same warning.

Warning: Training stopped at iteration 1 because training loss is NaN. Predictions using the output network might contain NaN values.

Model:

layers = [ ...

sequenceInputLayer(size(trainingFeatures{1},1))

lstmLayer(100,"OutputMode","sequence")

dropoutLayer(0.1)

lstmLayer(100,"OutputMode","last")

fullyConnectedLayer(5)

softmaxLayer

classificationLayer];

miniBatchSize = 30;

validationFrequency = floor(numel(trainingFeatures)/miniBatchSize);

options = trainingOptions("adam", ...

"MaxEpochs",100, ...

"MiniBatchSize",miniBatchSize, ...

"Plots","training-progress", ...

"Verbose",false, ...

"Shuffle","every-epoch", ...

"LearnRateSchedule","piecewise", ...

"LearnRateDropFactor",0.1, ...

"LearnRateDropPeriod",20,...

'InitialLearnRate',0.001,...

'ValidationData',{validationFeatures,adsValidation.Labels}, ...

'ValidationFrequency',validationFrequency);

Regards,

Arslan

댓글을 달려면 로그인하십시오.

i want to use LSTM based audio network to work with Live audio

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (2개)

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 23
이전 댓글 21개 표시이전 댓글 21개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

i want to use LSTM based audio network to work with Live audio

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (2개)

댓글 수: 5 이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 23 이전 댓글 21개 표시이전 댓글 21개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 23
이전 댓글 21개 표시이전 댓글 21개 숨기기