Optimal hidden nodes number

Question

0 개 추천

Hello everyone, I would like to find optimal hidden nodes number using structured trial an error. I did the following simulation :

Hmin = 1;
Hmax = 30;
dH = 1; 
NTrials = 5;

I took the minimum error for each 5 trials to plot the following graph:

My question is how to determine optimal hidden nodes from this graph ? Thank you.

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기

Joshua 2017년 7월 8일

Hamza, what specifically are you trying to get from the graph (minimum, inflection point, zero slope, etc.)? Unless you provide detailed information about what you're actually looking for and what algorithm was used to make the graph, we can't help you.

Hamza Ali 2017년 7월 9일

Thank you Joshua for your answer, i would like to find number of hidden nodes that gives least testing error.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Greg Heath 2017년 7월 9일

MATLAB Online에서 열기

1 개 추천

This is the approach I use. First I determine how many training equations are used

 Ntrneq = Ntrn*O  
 Ntrn   = number of training input/target pairs
 O      = dimension of the output target

Then, for a given MLP with an I-H-O network topology, the number of unknown weights is

 Nw = (I+1)*H+(H*1)*O.
 When
 H <= Hub = (Ntrneq-O)/(I+O+1),

the number of unknown weights does not exceed the number of training equations. Therefore, numerically estimated minimum error solutions are stable.

Otherwise Nw > Ntrneq and the net is OVERFIT. Then, unless precautions are taken, the dreaded phenomenon of

OVERTRAINING an OVERFIT NETWORK

can occur and solutions can be useless.

Two ways to avoid this are

 1. Use a validation design subset that will cause training to be stopped 
    when the validation subset error increases continually for 6(MATLAB  
    DEFAULT) consecutive epochs.
 2. Bayesian Regularization via either
    a. The TRAINBR training algorithm
    b. Using TRAINLM (the default) with the error function MSEREG

My approach is to use a double loop solution to

Minimize H
Subject to the training target subset constraint
     mse(errortrn) <= 0.01 * mean(var(targettrn',1))
This results in a training subset Rsquare that is greater than 0.99 !!!
The outer loop is over h = Hmin:dH:Hmax
The inner loop is over Ntrials of random initial weight assignments.

Even though I always use a validation subset, I try 10 initial random weight trials for each trial value of H with H <= Hub. If unsuccessful, I then consider H > Hub.

I have posted hundreds of examples in the NEWSGROUP and ANSWERS. A good search term is

Hmin:dH:Hmax

Hope this helps.

Thank you for formally accepting my answer

Greg.

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 2

Walter Roberson 2017년 7월 8일

0 개 추천

When you use neural networks, the lowest theoretical error always occurs at the point where the state matrices are large enough to include exact copies of every sample that you ever trained on, plus the known output for each of those samples. For example if you train on 50000 samples each of 17 features, then a neural network that is 50000 * 17 large (exact copy of input data) + 50000 large (exact copy of output data) will have an error rate of 0 for that data.

Such a system might be pretty useless on other data.

Likewise, if you are were doing clustering, then you can achieve 100% accuracy by using one cluster per unique input sample.

So... before you can talk about "optimal", you need to define exactly what you mean by that.

There are a lot of things for which the Pareto Rule applies: "for many events, roughly 80% of the effects come from 20% of the causes". This applies recursively -- of the 20% that remains after the first pass, 80% will be explained by 20% of the second layer of causes. And you can keep going with that. But it is common that the cost of each layer you go through is roughly the same, so addressing the first 80% of the second 20% of the original costs about as much as dealing with the original 80% did, and doing the next step costs about as much as everything already spent, and so on. Basically, for each step closer to 100% accuracy you get, the costs double.

Where is the "optimal"? Well that depends on whether you have resource limitations or if you prize 100% accuracy more than anything.

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 3

Greg Heath 2017년 7월 13일

편집: Greg Heath 2017년 7월 13일

0 개 추천

I = 5, O = 1, N = 46824

Ntrn ~ 0.7*N = 32877 Ntrneq = Ntrn*O = 32877 Hub = (Ntrneq-O)/(I+O+1) = 32876/7 ~ 4682

For H << Hub, try H <= Hub/10 or H < 468

The reason for the quirky numbers is because your data base is HUGE!

I would just start with the default H = 10 with Ntrials = 10 and continue doubling H until success. Then I would consider reducing by filling in the gaps between values already tried.

Hope this helps.

Greg

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

Hamza Ali 2017년 10월 9일

Hello Mr Greg,

According to your indications, i compute Hub for this following values (I = 5, O = 1, N = 46824), i found approximately Hub = 71. Then, i varied H from (Hmin = 10) to (Hmax = 80), and continue doubling H with (Ntrials = 10) for each value of H that is tested. I have got these results and i think that the best value of H is 50, because gradually as i increase the number of neurons, the test error decreases until it reaches 50, and from this value the error increases again.

you find enclosed the table that contains results, and screenshot of the plot (test error according to number of hidden layer).

Best regards.

Greg Heath 2017년 10월 10일

MATLAB Online에서 열기

The best way to judge is to state, a priori, how much error you will accept.

The simplest model is output = constant. To minimize mean-square-error that constant should be the target mean

output = mean(target,2)

and the resulting MSE is the mean biased target variance.

 vart1 = mse(target - mean(target,2))
       = mean(var(target',1))

I am usually satisfied with the goal

MSEgoal = 0.01 * vart1

which yields the square statistic See Wikipedia)

Rsquare = 1 - MSE/MSEgoal = 0.99

Hope this helps.

Greg

댓글을 달려면 로그인하십시오.

Optimal hidden nodes number

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

추가 답변 (2개)

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

카테고리

태그

Community Treasure Hunt

Optimal hidden nodes number

댓글 수: 4 이전 댓글 2개 표시 이전 댓글 2개 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

추가 답변 (2개)

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 3 이전 댓글 1개 표시 이전 댓글 1개 숨기기

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기