필터 지우기
필터 지우기

Function approximation: Neural network great 'on paper' but when simulated results are very bad?

조회 수: 2 (최근 30일)
I need some help with NN because I don't understand what happened. One hidden layer, I=4, H=1:20, O=1. I run each net architecture 10 times with different initial weights (left default initnw). I have in total 34 datasets which were divided 60/20/20 when using Levenberg-Marquadt algorithm. Mse_goal = 0.01*mean(var(t',1)), i calculate NMSE and R^2, choose best R^2, for that check performance of each subsample, check regression plots, check rmse. R^2 is usually around 0,95; R for each subset 0,98... But when I simulate network with completely new set of data, estimations deviate quite a lot. It is not because of extrapolation. Data are normalized with mapminmax, transfer functions tansig, purelin.
Trainbr was my first choice actually, since I have small dataset and trainbr doesn't need validation set (Matlab2015a), but it is awfully slow. I ran a net with trainbr and we are talking hours versus minutes with trainlm.
I've read a ton of Greg Heath's posts and tutorials and found very valuable information there, however, still nothing. I see no way out.
% Solve an Input-Output Fitting problem with a Neural Network
% Script generated by Neural Fitting app
% Created 09-Aug-2016 18:33:13
% This script assumes these variables are defined:
%
% MP_UA_K - input data.
% UA_K - target data.
close all, clear all
load varUA_K
x = MP_UA_K;
t = UA_K;
var_t=mean(var(t',1)); %t variance
[inputs,obs]=size(x); %
hiddenLayerSize = 20; %max number of neurons
numNN = 10; % number of training runs
neurons = [1:hiddenLayerSize]';
training_no = 1:numNN;
obs_no = 1:obs;
nets = cell(hiddenLayerSize,numNN);
trainOutputs = cell(hiddenLayerSize,numNN);
valOutputs = cell(hiddenLayerSize,numNN);
testOutputs = cell(hiddenLayerSize,numNN);
Y_all = cell(hiddenLayerSize,numNN);
performance = zeros(hiddenLayerSize,numNN);
trainPerformance = zeros(hiddenLayerSize,numNN);
valPerformance = zeros(hiddenLayerSize,numNN);
testPerformance = zeros(hiddenLayerSize,numNN);
e = zeros(numNN,obs);
e_all = cell(hiddenLayerSize,numNN);
NMSE = zeros(hiddenLayerSize,numNN);
r_train = zeros(hiddenLayerSize,numNN);
r_val = zeros(hiddenLayerSize,numNN);
r_test = zeros(hiddenLayerSize,numNN);
r = zeros(hiddenLayerSize,numNN);
Rsq = zeros(hiddenLayerSize,numNN);
for j=1:hiddenLayerSize
% Choose a Training Function
% For a list of all training functions type: help nntrain
% 'trainlm' is usually fastest.
% 'trainbr' takes longer but may be better for challenging problems.
% 'trainscg' uses less memory. Suitable in low memory situations.
trainFcn = 'trainbr'; % Bayesian Regularization backpropagation.
% Create a Fitting Network
net = fitnet(j,trainFcn);
% Choose Input and Output Pre/Post-Processing Functions
% For a list of all processing functions type: help nnprocess
net.input.processFcns = {'removeconstantrows','mapminmax'};
net.output.processFcns = {'removeconstantrows','mapminmax'};
% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
% podaci su sortirani prema zavisnoj varijabli, cca svaki treći dataset je
% testni
net.divideFcn = 'divideind'; % Divide data by index
net.divideMode = 'sample'; % Divide up every sample
net.divideParam.trainInd = [1:3:34,2:3:34];
% net.divideParam.valInd = [5:5:30];
net.divideParam.testInd = [3:3:34];
mse_goal = 0.01*var_t;
% Choose a Performance Function
% For a list of all performance functions type: help nnperformance
net.performFcn = 'mse'; % Mean Squared Error
net.trainParam.goal = mse_goal;
% Choose Plot Functions
% For a list of all plot functions type: help nnplot
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
'plotregression', 'plotfit'};
for i=1:numNN
% Train the Network
net = configure(net,x,t);
disp(['No. of hidden nodes ' num2str(j) ', Training ' num2str(i) '/' num2str(numNN)])
[nets{j,i}, tr{j,i}] = train(net,x,t);
y = nets{j,i}(x);
e (i,:) = gsubtract(t,y);
e_all{j,i}= e(i,:);
trainTargets = t .* tr{j,i}.trainMask{1};
%valTargets = t .* tr{j,i}.valMask{1};
testTargets = t .* tr{j,i}.testMask{1};
trainPerformance(j,i) = perform(net,trainTargets,y);
%valPerformance(j,i) = perform(net,valTargets,y);
testPerformance(j,i) = perform(net,testTargets,y);
performance(j,i)= perform(net,t,y);
rmse_train(j,i)=sqrt(trainPerformance(j,i));
%rmse_val(j,i)=sqrt(valPerformance(j,i));
rmse_test(j,i)=sqrt(testPerformance(j,i));
rmse(j,i)=sqrt(performance(j,i));
% outputs of all networks
Y_all{j,i}= y;
trainOutputs {j,i} = y .* tr{j,i}.trainMask{1};
%valOutputs {j,i} = y .* tr{j,i}.valMask{1};
testOutputs {j,i} = y .* tr{j,i}.testMask{1};
[r(j,i)] = regression(t,y);
[r_train(j,i)] = regression(trainTargets,trainOutputs{j,i});
%[r_val(j,i)] = regression(valTargets,valOutputs{j,i});
[r_test(j,i)] = regression(testTargets,testOutputs{j,i});
NMSE(j,i) = mse(e_all{j,i})/mean(var(t',1)); % normalized mse
% coefficient of determination
Rsq(j,i) = 1-NMSE(j,i);
end
[minperf_train,I_train] = min(trainPerformance',[],1);
minperf_train = minperf_train';
I_train = I_train';
% [minperf_val,I_valid] = min(valPerformance',[],1);
% minperf_val = minperf_val';
% I_valid = I_valid';
[minperf_test,I_test] = min(testPerformance',[],1);
minperf_test = minperf_test';
I_test = I_test';
[minperf,I_perf] = min(performance',[],1);
minperf = minperf';
I_perf = I_perf';
[maxRsq,I_Rsq] = max(Rsq',[],1);
maxRsq = maxRsq';
I_Rsq = I_Rsq';
[train_min,train_min_I] = min(minperf_train,[],1);
% [val_min,val_min_I] = min(minperf_val,[],1);
[test_min,test_min_I] = min(minperf_test,[],1);
[perf_min,perf_min_I] = min(minperf,[],1);
[Rsq_max,Rsq_max_I] = max(maxRsq,[],1);
end
figure(4)
hold on
xlabel('observation no.')
ylabel('targets')
scatter(obs_no,trainTargets,'b')
% scatter(obs_no,valTargets,'g')
scatter(obs_no,testTargets,'r')
hold off
figure(5)
hold on
xlabel('neurons')
ylabel('min. performance')
plot(neurons,minperf_train,'b',neurons,minperf_test,'r',neurons,minperf,'k')
hold off
figure(6)
hold on
xlabel('neurons')
ylabel('max Rsq')
scatter(neurons,maxRsq,'k')
hold off
% View the Network
%view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, ploterrhist(e)
%figure, plotregression(t,y)
%figure, plotfit(net,x,t)
% Deployment
% Change the (false) values to (true) to enable the following code blocks.
% See the help for each generation function for more information.
save figure(4).fig
save figure(5).fig
save figure(6).fig
if (false)
% Generate MATLAB function for neural network for application
% deployment in MATLAB scripts or with MATLAB Compiler and Builder
% tools, or simply to examine the calculations your trained neural
% network performs.
genFunction(net,'nn_UA_K_BR');
y = nn_UA_K_BR(x);
end
% sačuvati sve varijable iz workspacea u poseban file za daljnju analizu
save ws_UA_K_BR

채택된 답변

Greg Heath
Greg Heath 2016년 9월 3일
편집: Greg Heath 2016년 9월 5일
% I need some help with NN because I don't understand what happened. One % hidden layer, I=4, H=1:20, O=1. I run each net architecture 10 times % with different initial weights (left default initnw). I have in total % 34 datasets
Do you mean data points N = 34?
It typically takes ~ 10 to 30 data points per dimension to
adequately characterize a distribution. For a 4-D distribution I'd recommend
40 <~ Ntrn <~ 120
% which were divided 60/20/20 when using Levenberg-Marquadt
Ntrn = 34-2*round(0.2*34) = 20
Hub = (20-1)/(4+1+1) = 3.2
indicating you really don't have enough data to adequately characterize a 4-D distribution.
You should consider
1. Dimensionality reduction
2. k-fold crossvalidation
3. Adding new data with the same mean and covariance (stdv +
correlations) matrix
% algorithm. Mse_goal = 0.01*mean(var(t',1)), i calculate NMSE and R^2, % choose best R^2, for that check performance of each subsample, check % regression plots, check rmse. R^2 is usually around 0,95; R for each % subset 0,98... But when I simulate network with completely new set of % data, estimations deviate quite a lot. It is not because of % extrapolation.
No. It probably is. Your training data subset is insufficiently
large for 4 dimensions.
I would begin with minimizing H with dividetrain. Then consider
k-fold crossvalidation.
% Data are normalized with mapminmax, transfer functions tansig, % purelin. % Trainbr was my first choice actually, since I have small dataset and % trainbr doesn't need validation set (Matlab2015a), but it is awfully % slow. I ran a net with trainbr and we are talking hours versus minutes % with trainlm.
This may be a BUG. Let MATLAB know. What version are you using?
>> ver
% I've read a ton of Greg Heath's posts and tutorials and found very % valuable information there, however, still nothing. I see no way out.
It typically takes ~ 10 to 30 data points per dimension to adequately
characterize a distribution,
I suggest calculating the means and stdv for each data set to see how
much your training data is representative of the total 4-D
distribution that includes the new datasets. 2 or 3-D
color coded projections may be helpful.
Hope this helps.
Greg
  댓글 수: 12
Tea
Tea 2016년 9월 27일
Since discussion with dr. Heath helped me a lot, I feel obligated to share what I learnt after implementing all the advice (at least I hope I learnt):
- I had to add datapoints to datasets 1 and 2 (which now contain around 60 and 70 datapoints, respectively). This was not possible for dataset 3 (contains 30 datapoints)
- I used 10-fold cross-validation (xval)
- tried trainlm, trainbr, and recently traingdx learning algorithms
- obtained better results than before, which are actually quite satisfying.
I also experimented with leave-one-out-xval (LOOCV).
My conclusions are that:
- xval shows hope for learning nets on small sample size
- xval is quite demanding computationally, especially when used without early stopping (in my case Bayesian regularization, I set max number of epochs to 1000)
- if you think 10-fold xval is demanding, try LOOCV... I haven't had time or computational resources to experiment with this much, but from what I saw, results obtained were not much better or not better at all than 10-fold xval.
Currently I merged all three datasets (around 160 datapoints) for making net, because I need these results for comparison. I'm now unsure when dataset stops being 'small' but somehow believe that 160 is still not too large to ignore benefits of xval.
I'll try to post my code once, atm I don't have time to adapt it for posting since English is not my native language.
Greg Heath
Greg Heath 2016년 9월 28일
편집: Greg Heath 2016년 9월 28일
Please post your data in *.m or *.txt.
NEVERMIND! SEE BELOW.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Greg Heath
Greg Heath 2016년 9월 28일
AN OPTIMISTIC ESTIMATE USING DIVIDETRAIN:
% Solve an Input-Output Fitting problem with a Neural Network
% Script generated by Neural Fitting app
% Created 09-Aug-2016 18:33:13
% This script assumes these variables are defined:
%
% MP_UA_K - input data.
% UA_K - target data.
close all, clear all, clc, plt=0, tic
format short e
load varUA_K
whos
% Name Size Bytes Class
% MP_UA_K 3x34 816 double
% UA_K 1x34 272 double
% plt 1x1 8 double
x = MP_UA_K; t = UA_K;
[I N ] = size(x), [O N ] = size(t)% [ 3 34 ], [ 1 34]
vart1 = mean(var(t',1)) % 1.0259e+05
xt = [x;t]; minmaxxt = minmax(xt)
% minmaxxt = 2.0700e+02 7.6000e+02
% 3.5900e+02 1.0180e+03
% 1.5100e-02 2.8500e-01 % 10^4 LOWER!!!
% 8.1300e+02 2.4070e+03
x1 = x(1,:); x2 = x(2,:); x3=x(3,:);
plt = plt+1, figure(plt)
subplot(2,2,1), plot(x1,'k','LineWidth',2)
subplot(2,2,2), plot(x2,'b','LineWidth',2)
subplot(2,2,3), plot(x3,'g','LineWidth',2)
subplot(2,2,4), plot( t,'k','LineWidth',2)
GEH1 = 'DOES NOT LOOK PROMISING!!!'
Ntrneq = N*O % DIVIDETRAIN
Hub = (Ntrneq-O)/(I+O+1) % 6.6
Hmin = 0, dH = 1, Hmax = 10
Ntrials = 10
rng(0)
j=0
for h = 0:10
j=j+1
if h==0
net = fitnet([]);
Nw = (I+1)*O
else
net = fitnet(h);
Nw = (I+1)*h+(h+1)*O
end
Ndof = Ntrneq-Nw
MSEgoal = 0.01*max(Ndof,0)*vart1/Ntrneq
net.divideFcn = 'dividetrain';
net.trainParam.goal = MSEgoal;
net.trainParam.min_grad = MSEgoal/100;
for i = 1:Ntrials
i = i
net = configure(net,x,t);
[net tr y e ] = train(net,x,t);
NMSE(i,j) = 100*mse(e)/vart1;
end
end
NMSE = NMSE
minNMSE = min(NMSE)
medNMSE = median(NMSE)
meanNMSE = mean(NMSE)
maxNMSE = max(NMSE)
totaltime = toc % 96 sec
% NONOVERFITTING 0 <= H <= 6 < Hub = 6.6
H 0 1 2 3 4 5 6
minNMSE = 48.3 33.3 19.4 10.7 8.7 7.2 6.6
medNMSE = 48.3 33.3 24.5 17.0 10.8 8.1 7.4
meanNMSE = 48.3 40.0 33.4 16.7 12.1 8.3 7.5
maxNMSE = 48.3 100.0 76.7 26.6 22.3 11.2 8.4
GEH2 = 'With H = 6 can get Rsquare = 93.4 !'
% OVERFITTING Hub = 6.6 < 7 <= H <= 10
H 7 8 9 10
minNMSE = 5.97 5.96 5.96 5.96
medNMSE = 6.22 5.96 5.96 5.96
meanNMSE = 6.47 6.02 6.02 5.96
maxNMSE = 8.16 6.42 6.53 5.96
GEH3 = 'With OVERFITTING can only get 94.0 !'
% NMSE = NMSE
% Columns 1 through 6
%
% 4.8282e+01 3.3313e+01 2.6913e+01 1.9122e+01 9.3848e+00 1.1225e+01
% 4.8282e+01 3.3313e+01 2.2170e+01 1.0726e+01 1.0602e+01 8.7863e+00
% 4.8282e+01 3.3313e+01 2.1539e+01 1.5017e+01 1.3730e+01 7.8872e+00
% 4.8282e+01 3.3313e+01 2.0225e+01 1.5821e+01 1.1673e+01 7.5152e+00
% 4.8282e+01 3.3313e+01 1.9368e+01 1.2777e+01 1.2493e+01 7.6062e+00
% 4.8282e+01 3.3313e+01 6.2003e+01 1.1113e+01 2.2313e+01 8.0091e+00
% 4.8282e+01 3.3313e+01 7.6666e+01 1.8246e+01 1.0316e+01 8.2620e+00
% 4.8282e+01 3.3313e+01 3.1822e+01 1.9369e+01 1.1088e+01 8.6014e+00
% 4.8282e+01 1.0000e+02 3.2846e+01 1.8222e+01 8.7025e+00 8.1623e+00
% 4.8282e+01 3.3313e+01 2.0608e+01 2.6597e+01 1.0326e+01 7.2022e+00
%
% Columns 7 through 11
%
% 6.5668e+00 5.9673e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 7.2365e+00 6.6139e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 8.3531e+00 5.9903e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 7.3784e+00 8.1612e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 7.3713e+00 6.8227e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 7.6491e+00 6.2822e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 8.3575e+00 6.6919e+00 6.4153e+00 5.9635e+00 5.9635e+00
% 6.6564e+00 6.1604e+00 6.0776e+00 5.9635e+00 5.9635e+00
% 7.0978e+00 6.0554e+00 5.9635e+00 5.9635e+00 5.9635e+00
% 8.0990e+00 5.9676e+00 5.9635e+00 6.5254e+00 5.9635e+00
Hope this helps.
Greg
  댓글 수: 2
Tea
Tea 2016년 9월 28일
I believe that from perspective of a person with a lot of experience in ANN (such as yourself), Rsquare = 93.4 is not satisfying, but to me it's really good. Even lower Rsq, such as 0.85 is great. I tested networks with completely independent data (around 20 for each group), and results are good. When I say good, I check Rsq, R, mse, but also percentage deviation from experimental results. The thing is, one influential observation can mess with all all three indicators I mentioned, but I understand the background of the problem and I'm completely aware that even the most perfect net designed on large sample won't be able to predict such deviation.
I am working with experimental data of material behaviour (static and dynamic), and even 10 experiments of the same material from the same batch show significant amount of scatter, so it is completely unreasonable to expect Rsq = 0.99 - maybe if you have 'perfect' dataset, but perfect often means 'polished' and not representative for entire population.
I think it is important, when using ANN as a tool, to be aware of limitations of such approach, of your own field of research etc. Having some statistical background can help.
I knew I could improve my results, and I still don't know the upper bound of possible improvement - however, I'm quite satisfied. And MATLAB Answers helped me a lot.
Greg Heath
Greg Heath 2016년 9월 28일
I just ran your 4-input case with DIVIDETRAIN. Although Hub = 5.5 is 1 smaller than the 6.6 of the 3 input case, the information from the new input does allows Rsquare = 0.997 for H=5. In addition, overfitting with H >= 6 does not significantly improve performance.
% % NONOVERFITTING 0 <= H <= 5 < Hub = 5.5
% H 0 1 2 3 4 5
% minNMSE = 10.5 9.82 2.47 0.83 0.51 0.32
% medNMSE = 10.5 9.82 4.64 1.93 0.94 0.47
% meanNMSE = 10.5 9.82 14.7 2.48 1.00 0.48
% maxNMSE = 10.5 9.82 100.00 4.68 2.07 0.79
GEH2 = 'With H = 5 can get Rsquare = 99.7 !'
% OVERFITTING Hub = 5.5 < 6 <= H <= 10
% H 6 7 8 9 10
% minNMSE = 0.30 0.30 0.30 0.30 0.30
% medNMSE = 0.30 0.30 0.30 0.30 0.30
% meanNMSE = 0.35 0.41 0.30 0.30 0.30
% maxNMSE = 0.55 0.97 0.30 0.30 0.30
GEH3 = 'Cannot do significantly better by OVERFITTING!'
Hope this helps.
Greg
P.S. I used the optimistically biased DIVIDETRAIN results to get an upper bound on performance. Although the bias can be mitigated somewhat by multiplying NMSE by Ntrneq/Ndof, I prefer to use estimates based on nontraining data.

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by