Incorrect input size. The input images must have a size of [98 50 1].

Question

Matlab dubai 2021년 2월 1일

1
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/732463-incorrect-input-size-the-input-images-must-have-a-size-of-98-50-1

답변: Gabriele Bunkheila 2021년 5월 24일

MATLAB Online에서 열기

I am trying to voice command recognisation in audio toolbox

but i upload any coustom audio file i get Error"Incorrect input size. The input images must have a size of [98 50 1].".

Please can you guide

%Audo detection
%% Speech Command Recognition Using Deep Learning
% This example shows how to train a deep learning model that detects the
% presence of speech commands in audio. The example uses the Speech
% Commands Dataset [1] to train a convolutional neural network to recognize
% a given set of commands.
%
% To train a network from scratch, you must first download the data set. If
% you do not want to download the data set or train the network, then you
% can load a pretrained network provided with this example and execute the
% next two sections of the example: _Recognize Commands with a Pre-Trained
% Network_ and _Detect Commands Using Streaming Audio from Microphone_.
%% Recognize Commands with a Pre-Trained Network
% Before going into the training process in detail, you will use a
% pre-trained speech recognition network to identify speech commands.
%%
% Load the pre-trained network.
load('commandNet.mat')
 
%%
% The network is trained to recognize the following speech commands:
%
% * "yes"
% * "no"
% * "up"
% * "down"
% * "left"
% * "right"
% * "on"
% * "off"
% * "stop"
% * "go"
%
%%
% Load a short speech signal where a person says "stop".
 [x,fs] = audioread('audio.flac');
 
 %%
 % Listen to the command.
 sound(x,fs)
 
%%
% The pre-trained network takes auditory-based spectrograms as inputs. You
% will first convert the speech waveform to an auditory-based spectrogram.
%%
% Use the function |extractAuditoryFeature| to compute the auditory
% spectrogram. You will go through the details of feature extraction later
% in the example.
auditorySpect = helperExtractAuditoryFeatures(x,fs);
%%
% Classify the command based on its auditory spectrogram.
command = classify(trainedNet,auditorySpect)

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Jan 2021년 2월 2일

Please post a copy of the complete error messge. Trailing dimension of the size 1 are ignored in Matlab, but it is not clear, which command required [98 x 50] arrays as input. The term "images" sounds strange also.

Cristina Balint 2021년 5월 19일

편집: Cristina Balint 2021년 5월 19일

MATLAB Online에서 열기

Any updates on this issue? I have the same problem when I run the script below. The size of the features vector is [78 50 1] and I get this error when I call the classify function:

"Error using DAGNetwork/classify (line 175)

Incorrect input size. The input images must have a size of [98 50 1]."

I am using Matlab 2021a.

Also, the output of audioinfo('stop_command.flac') is:

Filename: '...MATLAB\Examples\R2021a\deeplearning_shared\DeepLearningSpeechRecognitionExample\stop_command.flac'
    CompressionMethod: 'FLAC'
          NumChannels: 1
           SampleRate: 16000
         TotalSamples: 12800
             Duration: 0.8000
                Title: []
              Comment: []
               Artist: []
        BitsPerSample: 16

[stopCmd, fs] = audioread('stop_command.flac');
sound(stopCmd./max(abs(stopCmd)), fs)
load('commandNet.mat', 'trainedNet')
disp(trainedNet)
afe = audioFeatureExtractor( ...
    'SampleRate',fs, ...
    'FFTLength',512, ...
    'Window',hann(round(0.025*fs),'periodic'), ...
    'OverlapLength',round(0.015*fs), ...
    'barkSpectrum',true);
setExtractorParams(afe,'barkSpectrum','NumBands',50,'WindowNormalization',false);
features = extract(afe, stopCmd);
size(features)
command = classify(trainedNet, features)

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Gabriele Bunkheila 2021년 5월 24일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/732463-incorrect-input-size-the-input-images-must-have-a-size-of-98-50-1#answer_707103

MATLAB Online에서 열기

Hi Cristina, I understand you are now sorted but I am including some more info below in case it can help others.

The network needs to see exactly the same type of input format it was trained with. In particular, the size of the input depends on:

The parameters in audioFeatureExtractor, such as -- on one hand, the output of the feature extraction algorithm (e.g. NumBands), on the other, the buffering parameters (in this case Window and OverlapLength)
The length of the actual input waveform segment

In this example, you honored correctly (1) but not (2). This pre-trained network requires all waveform segments to all be of 1s in length (=16000 samples at a sample rate of 16 kHz). However, this test segment stop_command.flac only includes 12800 samples. The most common approach is to simply pad this with zeros after the end, as in the fourth line of code below:

[stopCmd, fs] = audioread('stop_command.flac');
len = 1; % in seconds
N = len*fs;
stopCmdIn = [stopCmd(:,1); zeros(N-size(stopCmd,1),1,'like',stopCmd)];
afe = audioFeatureExtractor( ...
    'SampleRate',fs, ...
    'FFTLength',512, ...
    'Window',hann(round(0.025*fs),'periodic'), ...
    'OverlapLength',round(0.015*fs), ...
    'barkSpectrum',true);
setExtractorParams(afe,'barkSpectrum','NumBands',50,'WindowNormalization',false);
features = extract(afe, stopCmdIn);
size(features)
ans = 1×2
    98    50

Better padding approaches tend to distribute zeros across both start and end, as in the function helperExtractAuditoryFeatures.m (lines 37-46) coming with the example Speech Command Recognition Using Deep Learning. To open that, execute the following in your local MATLAB installation:

openExample('deeplearning_shared/DeepLearningSpeechRecognitionExample')

edit helperExtractAuditoryFeatures

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Incorrect input size. The input images must have a size of [98 50 1].

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Incorrect input size. The input images must have a size of [98 50 1].

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기