stochastic gradient descent neural network updating net in matlab
조회 수: 2 (최근 30일)
이전 댓글 표시
Is it possible to train (net) as stochastic gradient descent in matlab. If possible how?
I observe that it completely ignores the previous trained data's information update the complete information. It will be helpful for large scale training. If I train the complete data, it takes very long time.
For example train iteratively 100 part of the data.
TF1 = 'tansig';TF2 = 'tansig'; TF3 = 'tansig';% layers of the transfer function , TF3 transfer function for the output layers
net = newff(trainSamples.P,trainSamples.T,[NodeNum1,NodeNum2,NodeOutput],{TF1 TF2 TF3},'traingdx');% Network created
net.trainfcn = 'traingdm' ; %'traingdm';
net.trainParam.epochs = 1000;
net.trainParam.min_grad = 0;
net.trainParam.max_fail = 2000; %large value for infinity
while(1) // iteratively takes 10 data point at a time.
p %=> get updated with following 10 new data points
t %=> get updated with following 10 new data points
[net,tr] = train(net, p, t,[], []);
end
댓글 수: 0
채택된 답변
Greg Heath
2014년 1월 17일
2. Use the largest nndataset in the NNTBX for an example
help nndataset
doc nndataset
3. It is worthwhile to look at static correlation coefficients (help/doc corrcoef) and plots to help find
a. inputs that are so weakly correlated to all of the targets that those inputs can be omitted.
b. inputs that are so highly correlated with other inputs that they can be omitted
4. It may be useful to look at the input dimensionality reduction obtained with linear models (help regress)
5. Try to use as many defaults as possible when starting a NN design. Defaults that should be overridden should become evident during design trials.
6. What are the dimensions of your input and target matrices?
7. How many hidden nodes?
8. It is not necessary to use more than one hidden layer.
9. I used the largest nndataset
[ x,t] = building_dataset;
with size(x) = [ 14 4208], size(t) = [ 3 4208 ] and H = 70 hidden nodes. This yields about 10 times more training equations,3*4208= 12,624 ,than there are unknown weights (14+1)*70+(70+1)*3 = 1263.
Since the net was not close to being overfit, I only used a training set and obtained an adjusted Rsquared of 0.99 in 72 seconds with a straight forward FITNET design.
However a design by looping over 10 randomly chosen subsets took 109 seconds. The syntax after the random shuffling using randperm(4208) was
M = 420 % floor(4208/10)
imax = 10
for i=1:imax
k = 1+M*(i-1) : M*i;
[ net tr y( : , k ) ] = train( net, x( : , k ), t( : , k ) );
end
This probably doesn't show a savings because 14*4208 is not too large for the default trainlm.
I think all you have to do is use a larger data set (enough to choke trainlm) and a more appropriate training function , e.g., trainscg or trainrp.
Hope this helps.
Thank you for formally accepting my answer
Greg
댓글 수: 3
Greg Heath
2014년 1월 19일
1.Radically reduce the input dimensionality
2. It may not be necessary to use most of the data for training.
3. Consider combining multiple nets that are designed on different parts of the data
4. Use narxnet
5. Do not use 'dividerand' preserve data order
6. Determine the significant input and feedback correlation lags
7. Use trainscg or trainrp for large training sets
8. Use 1 hidden layer
9. Practice on the two longest MATLAB timeseries example data sets
help nndatasets
추가 답변 (2개)
Greg Heath
2014년 1월 19일
Answer by Alper Alimoglu about 8 hours ago Edited by Alper Alimoglu about 8 hours ago
My data set is formed by 1 000 000 data. I couldnt able to for example train 100 data points that iterativly continue by only training 100 data points each step and combine it with previous trainsportion of the data.
>I am using neural network to do prediction.
1. You should have said, first, that you wanted a net for prediction. That changes the approach from a regression/curvefitting design
help/doc fitnet % calls feedforwardnet
help/doc newfit(obsolete) % calls newff (obsolete)
to a time-series design
help/doc timedelaynet (input delays, no output feedback)
help/doc narxnet (input delays and delayed output feedback)
> My data set is formed by 1 000 000 data.
2. You should have also given the dimensions of the input and output. I will assume SISO.
3. N = 10^6 ==> You should first practice on the longest MATLAB timeseries nndatasets
help/doc nndatasets
% exchanger_dataset - Heat exchanger dataset.
[ X, T ] = exchanger_dataset;
whos
% Name Size Bytes Class Attributes
% T 1x4000 272000 cell
% X 1x4000 272000 cell
% maglev_dataset - Magnetic levitation dataset.
[ X, T ] = maglev_dataset;
whos
% Name Size Bytes Class Attributes
% T 1x4001 272068 cell
% X 1x4001 272068 cell
4. You can only predict as far ahead as the data will let you. This is determined by the significant lags of the input/output crosscorrelation function and/or the significant lags of the output autocorrelation function
5. Since N is large, use the fft to calculate the correlation functions instead of the BUGGY NNCORR or the correlation functions from other toolboxes.
6. It is worthwile to divide the data into many subsections to determine the correlation statistics that are consistent for all of the data. Plots should help.
7. See some of my recent posts on how to determine the significant thresholds and lags.
greg nncorr
8. The number of hidden nodes are chosen by trial and error if the default H = 10 is unsatisfactory.
9. Since N is huge and the default net.divideFcn = 'dividerand' destroys correlations, use 'dividetrain' in the first set of trials to determine good values for H and the delays (ID,FD).
> I wasn't able to comeup with a solution to train individual 100(small)partion of data and combine it with already trained portion.I also implemented recurrent neural network for this approach but again I face with the same problem.
10. Just because you have N~10^6, there is no rule that says you have to use all of it to train one net at the given data rate. Consider combining smaller nets designed over different intervals of time AND multiple parallel nets designed over the same time but using interleaving samples,One hidden layer per net is sufficient.
댓글 수: 0
Muthu Kumar
2021년 7월 14일
편집: Muthu Kumar
2021년 7월 14일
how to use the same algorithm for dc-dc converter control operation
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Modeling and Prediction with NARX and Time-Delay Networks에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!