Levenberg Marquardt algortihm as custom training function using dlupdate

Question

1 개 추천

Hi togehter,

I'm trying to implement the levenberg-marquardt algortihm in matlab with dlupdate as shown in the example (Use dlupdate to Train Network Using Custom Update Function) (https://de.mathworks.com/help/deeplearning/ref/dlupdate.html). The biggest challange is to calculate the jacobian matrix.

In the new deeplearning toolbox there are just the algorithm: sgdm, rmsprop, adam. But the levenberg-marquardt is not implemented.

Is there a easy way to calculate the jacobian matrix with dlgradient?

This is my code right now. It works somehow but it´s very slow and i am not sure if it´s correct.

clc;
clear;
%% Random data
XTrain = rand(15,1000)*0.1;
XTrain = XTrain-5;
TTrain = XTrain(1,:).^2 + XTrain(2,:).^2+ XTrain(3,:).^2+ XTrain(4,:).^2+ XTrain(5,:).^2+ XTrain(6,:).^2+ XTrain(7,:).^2+ XTrain(8,:).^2+...
     XTrain(9,:).^2+ XTrain(10,:).^2+ XTrain(11,:).^2+ XTrain(12,:).^2+ XTrain(13,:).^2+ XTrain(14,:).^2+ XTrain(15,:).^2;
TTrain = TTrain/10;
%% Define Network
layers = [
featureInputLayer(15)
fullyConnectedLayer(20)
tanhLayer
fullyConnectedLayer(20)
tanhLayer
fullyConnectedLayer(1)
functionLayer(@(x) x)
];
net = dlnetwork(layers);
%% Training Options
miniBatchSize =  128;
numEpochs = 2;
numObservations = numel(TTrain);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
XTrain = dlarray(XTrain, 'CB');
TTrain = dlarray(TTrain, 'CB');
%% Train Network
for epoch = 1:numEpochs
    % Shuffle data.
    idx = randperm(numel(TTrain));
    XTrain = XTrain(:,idx);
    TTrain = TTrain(idx);
    for iteration = 1:numIterationsPerEpoch
        % Get a batch of data.
        indices = (iteration-1)*miniBatchSize+1:iteration*miniBatchSize;
        XBatch = XTrain(:,indices);
        TBatch = TTrain(:,indices);
        [loss, J,e] = dlfeval(@modelLoss,net,XBatch,TBatch);
        e = extractdata(e);
        updateFcn = @(net,J) lmFunction(net,J, e);
        net = dlupdate(updateFcn,net,J);
        % Report the loss
        fprintf('Loss: %f\n', extractdata(loss));
    end
end
Loss: 646.412170
Loss: 7.421673
Loss: 11.123130
Loss: 7.209948
Loss: 5.831234
Loss: 1.093452
Loss: 0.242863
Loss: 0.870988
Loss: 0.398049
Loss: 0.738563
Loss: 0.304589
Loss: 0.171688
Loss: 0.072695
Loss: 0.107126
function [loss,J,e] = modelLoss(net,X,T)
    Y = forward(net,X);
    loss = mse(Y,T)/size(T,1);
    e = (Y-T).^2;
    % compute jacobian matrix
    J = dlgradient(e(1),net.Learnables);
    for j = 1:size(J.Layer,1)                   %iteration thru weitghts and biases of the layers
        layergrad = nan(numel(net.Learnables{:,"Value"}{j}),length(X));
        for i=1:size(X,2)                       
            grad = dlgradient(e(i),net.Learnables);
            layergrad(:,i) = reshape(grad{j,"Value"}{1},[],1);
        end   
        J{j,"Value"}{1} = layergrad;
    end
end
function parameters = lmFunction(parameters,J,e)
    % update rule for mu is not yet implemented
    mu = 10;
    H = J*J';
    I = eye(size(H,1));
    lmupdate = (H+mu*I)\(J*e'); 
    parameters = parameters - reshape(lmupdate,size(parameters));
end

Is there anybody had the same/similar problem in the past and can help me? Or can somebody give me some usefull hints? Thanks in advance.

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

Matt J 2023년 9월 5일

편집: Matt J 2023년 9월 5일

In the network you've shown there appears to be only 1 residual (assuming its a regression network at all), since the final fully connected layer has only 1 output. With only 1 residual, there really is no appreciable difference between Levenberg-Marquardt and standard steepest descent.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Matt J 2023년 9월 5일

편집: Matt J 2023년 9월 5일

0 개 추천

Levenberg-Marquardt would only be practical for very small networks and training data sizes. That is the case in the code you've shown, but if that is representative of your actual problem, it might be more appropriate just to use standard algorithms with 1 minibatch (i.e., with no division of the data into batches). That might improve convergence a lot, and would be a good idea to test before diving into Levenberg-Marquardt.

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

Matt J 2023년 9월 6일

I'm less familiar with the old toolbox, but,

(a) I don't see an sgd option in the old toolbox, so I don't see how you would have compared the new and old toolboxes fairly.

(b) the new toolbox is more expressly designed for Deep Learning. The problem dimension (data size, number of unknown parameters) in deep learning is greater than what the old toolbox seems to support, which limits the choice of algorithms.

(c) It is premature to conclude that the new toolbox is slow until you fix the problem with your minibatch selection. You are using a very large number of minibatches compared to your data size, which is why I recommended that you drop down to 1 minibatch, or at least something much smaller than 128.

Leo 2023년 9월 7일

Hi Matt,

thanks again for your answer:

a) sorry i mean gdm not sgd. My fault.

b) Ok thats the reason why.

c) Ok i didn´t recognize that from your last answer. I will try that.

BR

댓글을 달려면 로그인하십시오.

Levenberg Marquardt algortihm as custom training function using dlupdate

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

답변 (1개)

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

카테고리

제품

릴리스

태그

Community Treasure Hunt

Levenberg Marquardt algortihm as custom training function using dlupdate

댓글 수: 1 이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

답변 (1개)

댓글 수: 3 이전 댓글 1개 표시 이전 댓글 1개 숨기기

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기