필터 지우기
필터 지우기

The function dlgradient only returns zeroes when applied to neural net

조회 수: 7 (최근 30일)
Steven
Steven 2023년 11월 8일
댓글: Steven 2023년 11월 9일
Dear all,
I am trying to replicate a machine learning method from an economics paper using the Matlab Deep Learning package.
The setup of the deep learning problem is unconventional as I need to apply the neural net multiple times hence the guides that are available are no help unfortunately.
I try to calculate the gradient of my neural net using the following code. However, the gradient is always zero hence I must be doing something wrong.
function [loss,gradients] = modelLoss(dlnet,X,T,par)
Y = forward(dlnet,normalize(X(1:5,:),par));
k1 = exp(Y(2,:));
b1 = exp(Y(3,:));
% transitions of the exogenous processes
rknext = X(1,:) * par.rho_rk + X(6,:);
rbnext = X(2,:) * par.rho_rb + X(7,:);
wnext = X(3,:) * par.rho_w + X(8,:);
X1 = vertcat(rknext, rbnext, k1, b1, wnext);
Y1 = forward(dlnet,normalize(X1,par));
% transitions of the exogenous processes
rknext = X(1,:) * par.rho_rk + X(9,:) ;
rbnext = X(2,:) * par.rho_rb + X(10,:);
wnext = X(3,:) * par.rho_w + X(11,:);
X2 = vertcat(rknext, rbnext, k1, b1, wnext);
Y2 = forward(dlnet,normalize(X2,par));
loss = Stoch_loss(X,X1,X2,Y,Y1,Y2,T,par);
gradients = dlgradient(loss,dlnet.Learnables);
end
These are the functions that calculate the loss
function loss = Stoch_loss(X,X1,X2,Y,Y1,Y2,T,par)
[R1_e1, R2_e1, R3_e1] = residuals(X,X1,Y,Y1,par);
[R1_e2, R2_e2, R3_e2] = residuals(X,X2,Y,Y2,par);
R_squared = (R1_e1 .* R1_e2) + (R2_e1 .* R2_e2) + (R3_e1 .* R3_e2);
loss = l2loss(R_squared,T);
function [R1,R2,R3] = residuals(X,X1,Y,Y1,par)
%Calculate residuals
rk = X(1,:);
rb = X(2,:);
w = X(3,:);
k = X(4,:);
b = X(5,:);
c = exp(Y(1,:))+ 0.1;
k1 = exp(Y(2,:));
b1 = exp(Y(3,:));
c1 = exp(Y1(1,:))+ 0.1;
k2 = exp(Y1(2,:));
rknext = X1(1,:);
rbnext = X1(2,:);
d = k1 - par.rbar_rk.*exp(rk).*k;
d1 = k2 - par.rbar_rk.*exp(rknext).*k1;
R1 = 1 - par.beta .* (c1./c).^(-par.gamma) .* par.rbar.*exp(rbnext);
R2 = (w + par.rbar.*exp(rb) .* b - c - b1 - par.x0 .* abs_appr(d).^par.x1 - d);
R3 = (1 + d .* par.x0 .* par.x1 .* abs_appr(d) .^ (par.x1-2)) - (par.beta .* (c1./c).^(-par.gamma) .* par.rbar_rk.*exp(rknext) .* (1 + d .* par.x0 .* par.x1 .* abs_appr(d1).^(par.x1-2) ));
I have already solved this problem in Python with Tensorflow hence the setup in general is correct. So it must be my Matlab application where the issue lies.
Has anyone any idea how to solve my issue?
With kind regards,
Steven
  댓글 수: 2
Matt J
Matt J 2023년 11월 8일
However, the gradient is always zero hence I must be doing something wrong.
We don't know what "always" means. Surely there will be some cases where the gradients could all be zero, for example if there are negative activations going into all the networks ReLUs.
Steven
Steven 2023년 11월 9일
Thank you for responding. It was a indeed particular initialization which resulted in the gradients to be zero.
Problem is solved by changing the initizalization. Thanks!

댓글을 달려면 로그인하십시오.

답변 (0개)

카테고리

Help CenterFile Exchange에서 Image Data Workflows에 대해 자세히 알아보기

제품


릴리스

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by