Levenberg Marquardt algortihm as custom training function using dlupdate
이전 댓글 표시
Hi togehter,
I'm trying to implement the levenberg-marquardt algortihm in matlab with dlupdate as shown in the example (Use dlupdate to Train Network Using Custom Update Function) (https://de.mathworks.com/help/deeplearning/ref/dlupdate.html). The biggest challange is to calculate the jacobian matrix.
In the new deeplearning toolbox there are just the algorithm: sgdm, rmsprop, adam. But the levenberg-marquardt is not implemented.
Is there a easy way to calculate the jacobian matrix with dlgradient?
This is my code right now. It works somehow but it´s very slow and i am not sure if it´s correct.
clc;
clear;
%% Random data
XTrain = rand(15,1000)*0.1;
XTrain = XTrain-5;
TTrain = XTrain(1,:).^2 + XTrain(2,:).^2+ XTrain(3,:).^2+ XTrain(4,:).^2+ XTrain(5,:).^2+ XTrain(6,:).^2+ XTrain(7,:).^2+ XTrain(8,:).^2+...
XTrain(9,:).^2+ XTrain(10,:).^2+ XTrain(11,:).^2+ XTrain(12,:).^2+ XTrain(13,:).^2+ XTrain(14,:).^2+ XTrain(15,:).^2;
TTrain = TTrain/10;
%% Define Network
layers = [
featureInputLayer(15)
fullyConnectedLayer(20)
tanhLayer
fullyConnectedLayer(20)
tanhLayer
fullyConnectedLayer(1)
functionLayer(@(x) x)
];
net = dlnetwork(layers);
%% Training Options
miniBatchSize = 128;
numEpochs = 2;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
XTrain = dlarray(XTrain, 'CB');
TTrain = dlarray(TTrain, 'CB');
%% Train Network
for epoch = 1:numEpochs
% Shuffle data.
idx = randperm(numel(TTrain));
XTrain = XTrain(:,idx);
TTrain = TTrain(idx);
for iteration = 1:numIterationsPerEpoch
% Get a batch of data.
indices = (iteration-1)*miniBatchSize+1:iteration*miniBatchSize;
XBatch = XTrain(:,indices);
TBatch = TTrain(:,indices);
[loss, J,e] = dlfeval(@modelLoss,net,XBatch,TBatch);
e = extractdata(e);
updateFcn = @(net,J) lmFunction(net,J, e);
net = dlupdate(updateFcn,net,J);
% Report the loss
fprintf('Loss: %f\n', extractdata(loss));
end
end
function [loss,J,e] = modelLoss(net,X,T)
Y = forward(net,X);
loss = mse(Y,T)/size(T,1);
e = (Y-T).^2;
% compute jacobian matrix
J = dlgradient(e(1),net.Learnables);
for j = 1:size(J.Layer,1) %iteration thru weitghts and biases of the layers
layergrad = nan(numel(net.Learnables{:,"Value"}{j}),length(X));
for i=1:size(X,2)
grad = dlgradient(e(i),net.Learnables);
layergrad(:,i) = reshape(grad{j,"Value"}{1},[],1);
end
J{j,"Value"}{1} = layergrad;
end
end
function parameters = lmFunction(parameters,J,e)
% update rule for mu is not yet implemented
mu = 10;
H = J*J';
I = eye(size(H,1));
lmupdate = (H+mu*I)\(J*e');
parameters = parameters - reshape(lmupdate,size(parameters));
end
Is there anybody had the same/similar problem in the past and can help me? Or can somebody give me some usefull hints? Thanks in advance.
댓글 수: 1
In the network you've shown there appears to be only 1 residual (assuming its a regression network at all), since the final fully connected layer has only 1 output. With only 1 residual, there really is no appreciable difference between Levenberg-Marquardt and standard steepest descent.
답변 (1개)
Levenberg-Marquardt would only be practical for very small networks and training data sizes. That is the case in the code you've shown, but if that is representative of your actual problem, it might be more appropriate just to use standard algorithms with 1 minibatch (i.e., with no division of the data into batches). That might improve convergence a lot, and would be a good idea to test before diving into Levenberg-Marquardt.
댓글 수: 3
Leo
2023년 9월 6일
Matt J
2023년 9월 6일
I'm less familiar with the old toolbox, but,
(a) I don't see an sgd option in the old toolbox, so I don't see how you would have compared the new and old toolboxes fairly.
(b) the new toolbox is more expressly designed for Deep Learning. The problem dimension (data size, number of unknown parameters) in deep learning is greater than what the old toolbox seems to support, which limits the choice of algorithms.
(c) It is premature to conclude that the new toolbox is slow until you fix the problem with your minibatch selection. You are using a very large number of minibatches compared to your data size, which is why I recommended that you drop down to 1 minibatch, or at least something much smaller than 128.
Leo
2023년 9월 7일
카테고리
도움말 센터 및 File Exchange에서 Deep Learning Toolbox에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!