I have Backpropagation doubt

Question

jvbx 2024년 7월 24일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2139961-i-have-backpropagation-doubt

댓글: jvbx 2024년 7월 25일

I'm trying to do neural network with 2 hidden layers and one neuron in the output layer without any toolboxes and just with matrix and vectors multiplications. To do this, I created fictional simple data as below to help me in this task:

%Data
x = 1:1000;
y1 = sind(x);
y2 = sind(x+30);
y3 = cosd(x);
y4 = cosd(x+30);
y5 = cosd(x+45);
% y6 will be the desired output data taht I would like my neural network
% try to predict
y6 = (y1 + y2 + y3 + y4 + y5);

Then, I coded as I thought to be be the right way, but my neural network can´t reach a good result, as below:

My doubt is if the result isn´t good because my implementation isn´t right or because I need to add more mechanisms im my neural network (like momentum, regularization and etc.) ?

I will post my code below sorry about the naem of some variables, but originally I wrote this code in portuguese. I will comment the code to help undestand it

%Nueral network achictecture

n_h1 = 10;

n_h2 = 11;

n_out = 1;

%Adjustable parameters

w1 = rand(5,n_h1);

b1 = ones(1,n_h1)*rand(1,1);

w2 = rand(n_h1,n_h2);

b2 = ones(1,n_h2)*rand(1,1);

w_out = rand(n_h2,n_out);

b_out = ones(1,n_out)*rand(1,1);

sig_a = 1;

learning_rate = 0.001;

limiar = 0.002;

%Helpful variables

max_epocas = 1000;

conj_entrada = [y1;y2;y3;y4;y5];

erros_epoca = [];

%Backpropagation

for epoch = 1:max_epocas

for i = 1:size(conj_entrada,2)

if i ==1

soma = 0;

else

end

enter = conj_entrada(:,i);

h1_in = [w1;b1]'*[enter;1];

h1_out = sig(h1_in,sig_a,'False');

h2_in = [w2;b2]'*[h1_out;1];

h2_out = sig(h2_in,sig_a,'False');

saida_in = [w_out;b_out]'*[h2_out;1];

saida_out = saida_in;

erro = y6(i) - saida_out;

soma = soma + (erro^2);

%Here starts the part of the code where the gradients are being

%calculated. Note that, here, I tried to folllow the chain rule.

%let me try to help in the understanding. Saida in portuguese is

%like output in english so when you read ,for example,

%d_erro_d_saida_out you need to know that this is the derivative of

%the error in relation with the output of the output layer. In the

%same way, entrada means input and pesos means weights

%output layer

%chain rule

d_erro_d_saida_out = -1*erro;

d_saida_d_entrada_out = 1; %linear

grad_saida = erro*d_saida_d_entrada_out;

d_entrada_d_pesos_out = h2_out;

d_erro_d_pesos_out = d_erro_d_saida_out*d_saida_d_entrada_out*d_entrada_d_pesos_out;

% Update the wights and bias

w_out = w_out -learning_rate*d_erro_d_pesos_out;

b_out = b_out -learning_rate*d_erro_d_saida_out*d_saida_d_entrada_out;

%Second hidden layer (The neighbor layer of the output layer)

%chain rule

d_erro_d_saida_h2 = -1*w_out*grad_saida;

d_saida_d_entrada_h2 = sig(h2_in,sig_a,'True');

grad_h2 = sum(grad_saida)*d_saida_d_entrada_h2;

d_entrada_d_pesos_h2 = h1_out;

d_erro_d_pesos_h2 = d_entrada_d_pesos_h2*grad_h2';

% Update the wights and bias

w2 = w2 -1*learning_rate*d_erro_d_pesos_h2;

b2 = b2 -1*learning_rate*sum(d_erro_d_saida_h2.*d_saida_d_entrada_h2,1);

%First hidden layer (The neighbor layer of the seccond hidden layer)

%chain rule

d_erro_d_saida_h1 = -1*w2*grad_h2;

d_saida_d_entrada_h1 = sig(h1_in,sig_a,'True');

grad_h1 = sum(grad_h2)*d_saida_d_entrada_h1; %então daqui, tem que sair um 3x1

d_entrada_d_pesos_h1 = enter;

d_erro_d_pesos_h1 = d_entrada_d_pesos_h1*grad_h1'; %a segunda variável tem que resultar em um 1x3

% Update the wights and bias

w1 = w1 -1*learning_rate*d_erro_d_pesos_h1;

b1 = b1 -1*learning_rate*sum(d_erro_d_saida_h1.*d_saida_d_entrada_h1,1);

end

erro_atual = (soma/(2*size(x,2)));

erros_epoca = [erros_epoca;erro_atual];

if erros_epoca(epoch) <limiar

break

else

end

%testing the output of neural network

vetor_teste = 1:1000;

resposta_teste = zeros(1,size(vetor_teste,2));

for i = 1:size(vetor_teste,2)

enter_teste = conj_entrada(:,i);

h1_in_teste = [w1;b1]'*[enter_teste;1];

h1_out_teste = sig(h1_in_teste,sig_a,'False');

h2_in_teste = [w2;b2]'*[h1_out_teste;1];

h2_out_teste = sig(h2_in_teste,sig_a,'False');

saida_in_teste = [w_out;b_out]'*[h2_out_teste;1];

saida_out_teste = saida_in_teste; % a função de saída é linear;

resposta_teste(i) = saida_out_teste;

end

plot(1:size(erros_epoca,1),erros_epoca);

% plot(x,y3,'b',vetor_teste,resposta_teste,'r');

The code of my sigmoid activation function is below:

function [vetor_saida] = sig(vetor_entrada, const1, derivative)
    if strcmp(derivative, 'False') == 1
        vetor_saida = 1 ./ (1 + exp(-const1 * vetor_entrada));
    else
        sig_value = sig(vetor_entrada, const1, 'False');
        vetor_saida = const1 * sig_value .* (1 - sig_value);
    end
end

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Karan Singh 2024년 7월 25일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2139961-i-have-backpropagation-doubt#answer_1490306

편집: Karan Singh 2024년 7월 25일

MATLAB Online에서 열기

Hi @jvbx,

I dont think you need to change much but just expriment with the current values, here are a few points that I have found, please have take at them-

Instead of initializing weights and biases with rand, consider using a more sophisticated initialization method like Xavier or He initialization, which can help in faster convergence.
Your learning rate might be too low. Try experimenting with different learning rates (e.g., 0.01, 0.1).
You are using a linear activation function for the output layer. Depending on the nature of your problem, you might want to use a different activation function, i have used sigmoid.

x = 1:1000;

y1 = sind(x);

y2 = sind(x+30);

y3 = cosd(x);

y4 = cosd(x+30);

y5 = cosd(x+45);

% y6 will be the desired output data taht I would like my neural network

% try to predict

y6 = (y1 + y2 + y3 + y4 + y5);

% Neural network architecture

n_h1 = 10;

n_h2 = 11;

n_out = 1;

% Adjustable parameters with Xavier initialization

w1 = randn(5, n_h1) * sqrt(2/5);

b1 = randn(1, n_h1) * sqrt(2/5);

w2 = randn(n_h1, n_h2) * sqrt(2/n_h1);

b2 = randn(1, n_h2) * sqrt(2/n_h1);

w_out = randn(n_h2, n_out) * sqrt(2/n_h2);

b_out = randn(1, n_out) * sqrt(2/n_h2);

sig_a = 1;

learning_rate = 0.01; % Adjusted learning rate

limiar = 0.002;

% Helpful variables

max_epocas = 1000;

conj_entrada = [y1; y2; y3; y4; y5];

erros_epoca = [];

% Backpropagation

for epoch = 1:max_epocas

soma = 0;

for i = 1:size(conj_entrada, 2)

enter = conj_entrada(:, i);

h1_in = [w1; b1]' * [enter; 1];

h1_out = sig(h1_in, sig_a, 'False');

h2_in = [w2; b2]' * [h1_out; 1];

h2_out = sig(h2_in, sig_a, 'False');

saida_in = [w_out; b_out]' * [h2_out; 1];

saida_out = saida_in; % Linear activation for output layer

erro = y6(i) - saida_out;

soma = soma + (erro^2);

% Gradient calculation and weight updates

% Output layer

d_erro_d_saida_out = -erro;

d_saida_d_entrada_out = 1; % Linear activation

grad_saida = d_erro_d_saida_out * d_saida_d_entrada_out;

d_entrada_d_pesos_out = h2_out;

d_erro_d_pesos_out = d_entrada_d_pesos_out * grad_saida';

% Update the weights and biases

w_out = w_out - learning_rate * d_erro_d_pesos_out;

b_out = b_out - learning_rate * grad_saida;

% Second hidden layer

d_erro_d_saida_h2 = w_out * grad_saida;

d_saida_d_entrada_h2 = sig(h2_in, sig_a, 'True');

grad_h2 = d_erro_d_saida_h2 .* d_saida_d_entrada_h2;

d_entrada_d_pesos_h2 = h1_out;

d_erro_d_pesos_h2 = d_entrada_d_pesos_h2 * grad_h2';

% Update the weights and biases

w2 = w2 - learning_rate * d_erro_d_pesos_h2;

b2 = b2 - learning_rate * grad_h2';

% First hidden layer

d_erro_d_saida_h1 = w2 * grad_h2;

d_saida_d_entrada_h1 = sig(h1_in, sig_a, 'True');

grad_h1 = d_erro_d_saida_h1 .* d_saida_d_entrada_h1;

d_entrada_d_pesos_h1 = enter;

d_erro_d_pesos_h1 = d_entrada_d_pesos_h1 * grad_h1';

% Update the weights and biases

w1 = w1 - learning_rate * d_erro_d_pesos_h1;

b1 = b1 - learning_rate * grad_h1';

end

erro_atual = (soma / (2 * size(x, 2)));

erros_epoca = [erros_epoca; erro_atual];

if erros_epoca(epoch) < limiar

break;

end

% Testing the output of neural network

vetor_teste = 1:1000;

resposta_teste = zeros(1, size(vetor_teste, 2));

for i = 1:size(vetor_teste, 2)

enter_teste = conj_entrada(:, i);

h1_in_teste = [w1; b1]' * [enter_teste; 1];

h1_out_teste = sig(h1_in_teste, sig_a, 'False');

h2_in_teste = [w2; b2]' * [h1_out_teste; 1];

h2_out_teste = sig(h2_in_teste, sig_a, 'False');

saida_in_teste = [w_out; b_out]' * [h2_out_teste; 1];

saida_out_teste = saida_in_teste; % Linear activation for output layer

resposta_teste(i) = saida_out_teste;

end

plot(1:size(erros_epoca, 1), erros_epoca);

% plot(x, y3, 'b', vetor_teste, resposta_teste, 'r');

% Sigmoid activation function

function [vetor_saida] = sig(vetor_entrada, const1, derivative)

if strcmp(derivative, 'False') == 1

vetor_saida = 1 ./ (1 + exp(-const1 * vetor_entrada));

else

sig_value = sig(vetor_entrada, const1, 'False');

vetor_saida = const1 * sig_value .* (1 - sig_value);

end

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

jvbx 2024년 7월 25일

Hi, @Karan Singh

Thanks for you answer. I will code your suggestions and follow improving my code.

A couple of hours after post my doubt, I noticed that i was doing a litle mistake because the desired output value was in [-3,+3] interval or something like this, which off course is out of the range of sigmoidal function.Then I just modified y6 to stay in the [0,1].

With this, The code worked despite the neural network still need a lot of epochs to reach the desired error.

댓글을 달려면 로그인하십시오.

I have Backpropagation doubt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

I have Backpropagation doubt

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기