Neural network work better with small dataset than largest one ?

조회 수: 1 (최근 30일)
afef
afef 2017년 6월 7일
댓글: afef 2017년 6월 11일
Hi,i create neural network using nprtool at the begining i used input matrix with 9*981 but i got accuracy in the confusion matrix of 65% then i reduced the samples and i used input matrix with 9*102 and i got accuracy of 94.1% . So is this possible and correct ? and i want to know what's the reason for that.
Thanks

채택된 답변

Jeong_evolution
Jeong_evolution 2017년 6월 7일
편집: Jeong_evolution 2017년 6월 7일
If the Input parameter in historical dataset(9*102) is highly correlated(important) with the target, it is possible. And I think historical dataset(9*981) is increased, but it seems to be decreases in correlation or Importance to the target.
  댓글 수: 3
Jeong_evolution
Jeong_evolution 2017년 6월 7일
편집: Jeong_evolution 2017년 6월 7일
Input parameter = Input
target = output
historical dataset = Input+Output(=all dataset)
If you let me know the characteristic of dataset, I will let you know as far as I know.
afef
afef 2017년 6월 10일
I have some statistical feature extracted from EEG signal to detect epileptic seizure and this is a part of the input and target that i used

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Jeong_evolution
Jeong_evolution 2017년 6월 7일
Add, you have to select Input parameters that is more related with target before using NN.

Greg Heath
Greg Heath 2017년 6월 10일
With respect to the original question:
You really cannot deduce anything worthwhile about performance on the N = 981 dataset by using one subset of n = 102. Also, it is not clear if the 102 are all training data or are divided into trn/val/tst subsets.
A more rigorous approach would be to use m-fold cross validation which uses data RANDOMLY divided into m subsets of size M ~= 981/m. This can be repeated as many times as you want because all of the data is randomly distributed. In particular you can optimize m and separate the 3 trn/val/tst performances.
Note that this is different from traditional stratified m-fold crossval where each point is only in one of the m subsets. However, it is MUCH easier to implement and can be repeated as many times as needed to reduce prediction uncertainties.
Hope this helps.
Thank you for formally accepting my answer
Greg
  댓글 수: 1
afef
afef 2017년 6월 11일
I used at first a dataset with N= 981 and because i didn't get a good accuracy so i tried a small dataset with N= 102 to see if the performance is better . Concerning the m-fold cross validation how could i do it please?

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Sequence and Numeric Feature Data Workflows에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by