Levenberg Marquardt algortihm as custom training function using dlupdate

Hi togehter,
I'm trying to implement the levenberg-marquardt algortihm in matlab with dlupdate as shown in the example (Use dlupdate to Train Network Using Custom Update Function) (https://de.mathworks.com/help/deeplearning/ref/dlupdate.html). The biggest challange is to calculate the jacobian matrix.
In the new deeplearning toolbox there are just the algorithm: sgdm, rmsprop, adam. But the levenberg-marquardt is not implemented.
Is there a easy way to calculate the jacobian matrix with dlgradient?
This is my code right now. It works somehow but it´s very slow and i am not sure if it´s correct.
clc;
clear;
%% Random data
XTrain = rand(15,1000)*0.1;
XTrain = XTrain-5;
TTrain = XTrain(1,:).^2 + XTrain(2,:).^2+ XTrain(3,:).^2+ XTrain(4,:).^2+ XTrain(5,:).^2+ XTrain(6,:).^2+ XTrain(7,:).^2+ XTrain(8,:).^2+...
XTrain(9,:).^2+ XTrain(10,:).^2+ XTrain(11,:).^2+ XTrain(12,:).^2+ XTrain(13,:).^2+ XTrain(14,:).^2+ XTrain(15,:).^2;
TTrain = TTrain/10;
%% Define Network
layers = [
featureInputLayer(15)
fullyConnectedLayer(20)
tanhLayer
fullyConnectedLayer(20)
tanhLayer
fullyConnectedLayer(1)
functionLayer(@(x) x)
];
net = dlnetwork(layers);
%% Training Options
miniBatchSize = 128;
numEpochs = 2;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
XTrain = dlarray(XTrain, 'CB');
TTrain = dlarray(TTrain, 'CB');
%% Train Network
for epoch = 1:numEpochs
% Shuffle data.
idx = randperm(numel(TTrain));
XTrain = XTrain(:,idx);
TTrain = TTrain(idx);
for iteration = 1:numIterationsPerEpoch
% Get a batch of data.
indices = (iteration-1)*miniBatchSize+1:iteration*miniBatchSize;
XBatch = XTrain(:,indices);
TBatch = TTrain(:,indices);
[loss, J,e] = dlfeval(@modelLoss,net,XBatch,TBatch);
e = extractdata(e);
updateFcn = @(net,J) lmFunction(net,J, e);
net = dlupdate(updateFcn,net,J);
% Report the loss
fprintf('Loss: %f\n', extractdata(loss));
end
end
Loss: 646.412170 Loss: 7.421673 Loss: 11.123130 Loss: 7.209948 Loss: 5.831234 Loss: 1.093452 Loss: 0.242863 Loss: 0.870988 Loss: 0.398049 Loss: 0.738563 Loss: 0.304589 Loss: 0.171688 Loss: 0.072695 Loss: 0.107126
function [loss,J,e] = modelLoss(net,X,T)
Y = forward(net,X);
loss = mse(Y,T)/size(T,1);
e = (Y-T).^2;
% compute jacobian matrix
J = dlgradient(e(1),net.Learnables);
for j = 1:size(J.Layer,1) %iteration thru weitghts and biases of the layers
layergrad = nan(numel(net.Learnables{:,"Value"}{j}),length(X));
for i=1:size(X,2)
grad = dlgradient(e(i),net.Learnables);
layergrad(:,i) = reshape(grad{j,"Value"}{1},[],1);
end
J{j,"Value"}{1} = layergrad;
end
end
function parameters = lmFunction(parameters,J,e)
% update rule for mu is not yet implemented
mu = 10;
H = J*J';
I = eye(size(H,1));
lmupdate = (H+mu*I)\(J*e');
parameters = parameters - reshape(lmupdate,size(parameters));
end
Is there anybody had the same/similar problem in the past and can help me? Or can somebody give me some usefull hints? Thanks in advance.

댓글 수: 1

Matt J
Matt J 2023년 9월 5일
편집: Matt J 2023년 9월 5일
In the network you've shown there appears to be only 1 residual (assuming its a regression network at all), since the final fully connected layer has only 1 output. With only 1 residual, there really is no appreciable difference between Levenberg-Marquardt and standard steepest descent.

댓글을 달려면 로그인하십시오.

답변 (1개)

Matt J
Matt J 2023년 9월 5일
편집: Matt J 2023년 9월 5일
Levenberg-Marquardt would only be practical for very small networks and training data sizes. That is the case in the code you've shown, but if that is representative of your actual problem, it might be more appropriate just to use standard algorithms with 1 minibatch (i.e., with no division of the data into batches). That might improve convergence a lot, and would be a good idea to test before diving into Levenberg-Marquardt.

댓글 수: 3

Hi Matt J,
thanks a lot for your answer. The reason why i want to use the Levenberg-Marquardt is that i want to compare the new deeplearning toolbox and the old neural network toolbox. In the past i used the levenberg-marquardt (trainlm) because of the good performance. Now i also want to use the levenberg-marquardt in the new deeplearning toolbox. When i´m using e.g. sgd it seems to be that the newer toolbox is slower and the performance is worse in compare to the older toolbox. Can you agree? What is your opinion about that? And why are in the new toolbox just those 3 algorithm?
BR
I'm less familiar with the old toolbox, but,
(a) I don't see an sgd option in the old toolbox, so I don't see how you would have compared the new and old toolboxes fairly.
(b) the new toolbox is more expressly designed for Deep Learning. The problem dimension (data size, number of unknown parameters) in deep learning is greater than what the old toolbox seems to support, which limits the choice of algorithms.
(c) It is premature to conclude that the new toolbox is slow until you fix the problem with your minibatch selection. You are using a very large number of minibatches compared to your data size, which is why I recommended that you drop down to 1 minibatch, or at least something much smaller than 128.
Hi Matt,
thanks again for your answer:
a) sorry i mean gdm not sgd. My fault.
b) Ok thats the reason why.
c) Ok i didn´t recognize that from your last answer. I will try that.
BR

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Deep Learning Toolbox에 대해 자세히 알아보기

제품

릴리스

R2022b

질문:

Leo
2023년 9월 5일

댓글:

Leo
2023년 9월 7일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by