Avoid exploding/vanishing gradient problem with NARX nets?

Question

0 개 추천

I am performing system identification using neural networks with 5 inputs and 1 output. NARX networks seem to give good results when the gradients are stable during training. However, I often run into exploding/vanishing gradient problems when training a NARX network in closed loop. I can observe this in the nntraintool window - the gradient diverges and becomes unstable then the maximum mu performance criteria in triggers and prematurely ends the network training. Note that I first train the NARX network in open loop with a performance goal of 1e-09. I then close the loop and retrain in closed loop form using the open loop weights and biases as initial values.

I would like to keep the NARX network architecture since when training isn't interrupted by an exploding gradient, it performs quite well on new data. Do you have any strategies or examples for avoiding exploding/vanishing gradient problems with NARX networks? I can't seem to find any discussions, documentations, or examples going over this issue.

One workaround that I've read up on is to use a leaky ReLu activation function. However, I do not see an option like this for a NARX network's hidden layer transfer function. The closest I can find is 'poslin' but I still run into similarly unstable gradients. Another workaround I've seen is to use a LSTM or GRU network. However, every LSTM/GRU network that I have trained has not reached the same level of performance as the NARX network.

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Anshika Chaurasia 2020년 9월 1일

MATLAB Online에서 열기

0 개 추천

Hi Chris,

As you have mentioned that one possible workaround is to use leaky ReLU activation function as NARX network's hidden layer transfer function. Consider following steps to replace tansig transfer function with leaky ReLU:

Generate a MATLAB code of the NARX net. Refer command-line function.

Define leaky ReLU function in the generated MATLAB code as:

% ===== MODULE FUNCTIONS ========
 % Leaky ReLU Transfer Function
function a = leakyrelu_apply(n,scale,~)
       if n>=0
          a = n;
      else
          a = scale*n;
      end
end

In the script within myNeuralNetworkFunction(X,Xi,~) do following changes:

% Time loop
for ts=1:TS
    ...
    
    % Layer 1
    a1 = leakyrelu_apply(repmat(b1,1,Q) + IW1_1*tapdelay1 + IW1_2*tapdelay2,0.01);
   ....
end

You could also try Glorot (or Xavier) or He Initialization to initialize weights and bias. Refer to link for implementation of Glorot Initialization.

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

Anshika Chaurasia 2020년 9월 2일

Hi Chris,

For a1, a2, etc., you have to just define leakyrelu_apply() function once and then call this function as per your requirement.
Glorot or He Initialization are useful to avoid vanishing/expanding gradient problem. So, if applying leaky ReLU activation doesn't solve your problem and then try these weight initialization methods.
You could implement leaky ReLU transfer function within myNeuralNetworkFunction(X,Xi,~) only, which is possible after training. Hence, generate script (after training) then change transfer function and retrain the modified network using myNeuralNetworkFunction(X,Xi,~).

Chris P 2020년 9월 2일

Oh I was unaware that you could retrain a network like that. I've been using the net variables within the workspace but instead are you proposing that you can retrain the network using the generated script? Is there an example of this?

댓글을 달려면 로그인하십시오.

Answer 2

Greg Heath 2020년 9월 1일

0 개 추천

Use a higher open-loop peformance goal. Then lower the value after the loop is closed.

It's been years since I've done this so I can't give exact numerical details.

However, I recall always scaling my non-integer inputs and targets to have zero mean and unit variance.

Hope this helps.

Greg

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

Greg Heath 2020년 9월 1일

I was also puzzled by this answer. However, after spending an unreasonable amount of time trying to solve the puzzle, I gave up, added it to the bag of " Greg's Frustrations " and kept on truckin'.

However, your ideas are certainly welcome!

Greg

Chris P 2020년 9월 1일

Okay sounds good

댓글을 달려면 로그인하십시오.

Avoid exploding/vanishing gradient problem with NARX nets?

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

추가 답변 (1개)

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

카테고리

제품

릴리스

태그

Community Treasure Hunt

Avoid exploding/vanishing gradient problem with NARX nets?

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시 이전 댓글 1개 숨기기

추가 답변 (1개)

댓글 수: 3 이전 댓글 1개 표시 이전 댓글 1개 숨기기

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기