Issues with LSTM prediction due to normalization Layer settings

Question

Patrick Sontheimer 2023년 9월 22일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2024467-issues-with-lstm-prediction-due-to-normalization-layer-settings

댓글: Patrick Sontheimer 2023년 10월 18일

Hello, i've recently tried to create a LSTM-Seq2Seq Model using Multiple-Input Multiple-Output data. It's simulation data, so timesteps are correlated and i use 'sequence' as output mode for all LSTM cells. I've had a look at the tutorial cases and my situation most closely resembles the turbofan tutorial case. https://www.mathworks.com/help/deeplearning/ug/sequence-to-sequence-regression-using-deep-learning.html

I tried both manual normalization and the sequenceInputLayer options. In the latter case there are issues with the prediction. I'll attach what i'm doing as code below (i left out the sequence sorting, which can be found in the turbofan tutorial). This code uses a noisy linear trend for training and validation instead of my real data. I'll attach some prediction plots using the actual data. I've confirmed that the same issue from my code is reproduced in the example below.

The alternative to the code below is to do all steps the same, but instead use

net = trainNetwork(XTrain,YTrain,Layers,options);

for training and

sequenceInputLayer(numFeatures,"Normalization","rescale-symmetric")

as Input Layer. Finally, the plots can be created with:

PYVal = predict(net,XVal,'MiniBatchSize',1);
i=1;
% plot validation targets and predictions
tiledlayout(1,numResponses,"TileSpacing","tight");
figure
for j=1:numResponses
    nexttile
    TargetY     =  YVal{i}(j,:);
    PredictionY = PYVal{i}(j,:);
    plot(TargetY,'k')
    hold on
    plot(PredictionY,'b')
    hold on
    legend('YVal','PYVal')
    hold off
    %Remove axis to fit more plots:
    set(gca,'xtick',[],'ytick',[])
    set(gca,'xtick',[],'ytick',[])
end

Can someone explain to me what is not working correctly with the SequenceInputLayer options and how to fix it?

Now the code (can't run the code in preview so maybe you have to run it locally):

Note: I edited this post. It occured to me that providing you with an example using completly randomly distributed training and validation data doesn't provide the ANN with any valueable learnable patterns.

% Define parameters
numSequences = 3;
numFeatures = 4;
numResponses = 5;
numTimesteps = 100;
Interval = [0.1 0.9];
StartTimesteps = round(numTimesteps*Interval(1),1)
EndTimesteps = round(numTimesteps*Interval(2),-1)
% Initialize cells
XTrain = cell(numSequences,1);
YTrain = cell(numSequences,1);
XVal = cell(numSequences,1);
YVal = cell(numSequences,1);
% Initialize normalized cells
NX_Train = cell([numSequences 1]);
NY_Train = cell([numSequences 1]);
NX_Val   = cell([numSequences 1]);
NY_Val   = cell([numSequences 1]);
% Initialize Help Variables
HelpXT = zeros(numFeatures,numTimesteps);
HelpXV = zeros(numFeatures,numTimesteps);
HelpYT = zeros(numResponses,numTimesteps);
HelpYV = zeros(numResponses,numTimesteps);
% Fill Input Data with a noisy function 
for s=1:numSequences
    for i=1:numFeatures
        for j=1:StartTimesteps
            HelpXT(i,j) = randn+i*5;
            HelpXV(i,j) = randn+i*5;
        end
        for j=StartTimesteps:EndTimesteps
            HelpXT(i,j) = randn+i*5+ 0.2*j;
            HelpXV(i,j) = randn+i*5+ 0.2*j;
        end
        for j=EndTimesteps:numTimesteps
            k = 0.2*EndTimesteps;
            HelpXT(i,j) = randn+i*5+k;
            HelpXV(i,j) = randn+i*5+k;
        end
    end
    XTrain{s}   = HelpXT;
    XVal{s}     = HelpXV;
end
clear k
% Fill Output Data with noisy linear trend
for s=1:numSequences
    for i=1:numResponses
        for j=1:StartTimesteps
            HelpYT(i,j) = randn+i*5;
            HelpYV(i,j) = randn+i*5;
        end
        for j=StartTimesteps:EndTimesteps
            HelpYT(i,j) = randn+i*5+ 0.2*j;
            HelpYV(i,j) = randn+i*5+ 0.2*j;
        end
        for j=EndTimesteps:numTimesteps
            k = 0.2*EndTimesteps;
            HelpYT(i,j) = randn+i*5+k;
            HelpYV(i,j) = randn+i*5+k;
        end
    end
    YTrain{s}   = HelpYT;
    YVal{s}     = HelpYV;
end
clear k
% Normalize the first dataset
[NX_Train{1},SX_Train] = mapminmax(XTrain{1});
[NY_Train{1},SY_Train] = mapminmax(YTrain{1});
% Normalize all remaining datasets using the same options
for i=2:numel(XTrain)
    NX_Train{i} = mapminmax('apply',XTrain{i},SX_Train);
    NY_Train{i} = mapminmax('apply',YTrain{i},SY_Train);
end
for i=1:numel(XVal)
    NX_Val{i} = mapminmax('apply',XVal{i},SX_Train);
    NY_Val{i} = mapminmax('apply',YVal{i},SY_Train);
end
% Define network options:
numHiddenUnits = 3;
miniBatchSize = 1;
% Define network architecture
Layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(numHiddenUnits,'OutputMode','sequence')
    dropoutLayer(0.5,"Name",'dropout')
    lstmLayer(numHiddenUnits,'OutputMode','sequence')
    dropoutLayer(0.5,"Name",'dropout_2')
    fullyConnectedLayer(numResponses)
    regressionLayer];
% Define training options
maxEpochs                   = 100;
InitialLearnRate            = 1e-2;
Shuffle                     = 'every-epoch';
Plots                       = 'training-progress';
GradientThreshold           = 1;
Verbose                     = 0;
ValidataionData             ={NX_Val, NY_Val};
ValidationFrequency         = 1;
OutputNetwork               = 'best-validation-loss';
L2Regularization            = 0.05;
% Save training options
options = trainingOptions('adam'                                ,...
    'MaxEpochs',maxEpochs                                       ,...
    'MiniBatchSize',miniBatchSize                               ,...
    'InitialLearnRate',InitialLearnRate                         ,...
    'GradientThreshold',GradientThreshold                       ,...
    'Shuffle', Shuffle                                          ,...
    'Plots',Plots                                               ,...
    'Verbose', Verbose                                          ,...
    'ValidationData', ValidataionData                           ,...
    'validationFrequency',ValidationFrequency                   ,...
    'OutputNetwork',OutputNetwork                               ,...
    'L2Regularization',L2Regularization                         );
% Train the network
net = trainNetwork(NX_Train,NY_Train,Layers,options);
% Predict with the network on the validation data
PN_YVal = predict(net,NX_Val,'MiniBatchSize',1);
% initialize renormalized values
A = cell(size(XTrain,1),1); % XTrain
B = cell(size(XTrain,1),1); % YTrain
C = cell(size(XTrain,1),1); % XVal
D = cell(size(XTrain,1),1); % YVal
E = cell(size(XTrain,1),1); % PYVal
% renormalize data
% you can compare elements of A with XTrain, etc., as sanity check
for i=1:size(XTrain,1)
    A{i} = mapminmax('reverse',NX_Train{i},SX_Train);
    B{i} = mapminmax('reverse',NY_Train{i},SY_Train);
    C{i} = mapminmax('reverse',NX_Val{i},SX_Train);
    D{i} = mapminmax('reverse',NY_Val{i},SY_Train);
    E{i} = mapminmax('reverse',PN_YVal{i},SY_Train);
end
% Which sequence to plot
i=1;
% plot validation targets and predictions
tiledlayout(1,numResponses,"TileSpacing","tight");
figure
for j=1:numResponses
    nexttile
    TargetY     =  D{i}(j,:);
    PredictionY = E{i}(j,:);
    plot(TargetY,'k')
    hold on
    plot(PredictionY,'b')
    hold on
    legend('YVal','PYVal')
    hold off
    %Remove axis to fit more plots:
    set(gca,'xtick',[],'ytick',[])
    set(gca,'xtick',[],'ytick',[])
end

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Neha 2023년 10월 12일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2024467-issues-with-lstm-prediction-due-to-normalization-layer-settings#answer_1331449

MATLAB Online에서 열기

Hi Patrick,

I understand that you are facing issues while normalizing the training data for LSTM and you are not getting the correct predictions when sequenceInputLayer normalization options are used instead of using mapminmax. So when you were using mapminmax, you were scaling both X and Y data but when you tried normalizing at the input layer you must have normalized only the input data.

So you can normalize the output data using mapminmax and then rescale the input data at the seqenceInputLayer.

[NY_Train{1},SY_Train] = mapminmax(YTrain{1});
for i=2:numel(XTrain)
    NY_Train{i} = mapminmax('apply',YTrain{i},SY_Train);
end
for i=1:numel(XVal)
    NY_Val{i} = mapminmax('apply',YVal{i},SY_Train);
end

By specifying max and min, the normalization is analogous to mapminmax, but it's not mandatory, specifying only the type of normalization is sufficient.

sequenceInputLayer(numFeatures, "Normalization","rescale-symmetric", "Max", max(XTrain{1},[],2),"Min", min(XTrain{1},[],2))

In general, it is not necessary to normalize the output data in an LSTM network. The normalization of the output data depends on the specific task and the range of values the output can take. If the output values have a wide range or are continuous, it may be beneficial to apply normalization techniques. This can help in cases where the output values have high variances or are sensitive to scale differences.

Hope this helps!

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Patrick Sontheimer 2023년 10월 18일

Hello Neha,

thank you for your answer, it's much appreciated. I think you're right. Additionally, i used several sequences and so far only normalized with the first sequences max and min values as a stopgap measure. From now on i'll normalize all values without the sequencelayer options and i'll calculate the min and max values from all combined sequences both for the input and the output. Scale difference is a concern for the outputs. I've asked around and been told i might appreciate using a custom loss function to adress the issue, as it can evaluate differently sized outputs with different prioritization and removes the need to normalize the output, which makes the RMSE interpretable again (same dimension as output values). I still have issues with the accuracy of my Models, but i think the original question has been adressed.

댓글을 달려면 로그인하십시오.

Issues with LSTM prediction due to normalization Layer settings

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Issues with LSTM prediction due to normalization Layer settings

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기