classificationLearner/machine learning question
조회 수: 3 (최근 30일)
이전 댓글 표시
Hi all,
Sorry for the potentially basic nature of this question. I am looking to use machine learning (e.g. SVM) to determine whether certain features in neural data can indicate performance in a task. I am doing purely a binary classification. I have started with the classificationLearner app, just to get familiarised, and then exported the code to work with my dataset within my own script.
My question is that when inputting all data to classificationLearner, can you take the output of model accuracy following k-fold as a proxy for performance on the entire dataset? That is, to determine whether all my features are suitable predictors of the performance or stimuli presented, is it valid to input all my data into classificationLeaner (or the code generated by this) and use the validationAccuracy output (following k-fold cross-validation) as my model performance for the entire dataset?
Furthermore, if this is an okay thing to do, is there a way of stratifying the data when doing training/cross validation so that I have a (roughly) even number of each class going into each fold?
I guess my thinking is that if I do k-fold cross validation on the entire dataset, I'm essentially retraining and testing the model each time (either using a leave-one-out strategy or holding out a certain percentage of the data for testing), and I can therefore use the average accuracy as my model performance. Is this correct, or wildly off the mark?
I very much appreciate any help and input!
댓글 수: 0
채택된 답변
Puru Kathuria
2020년 7월 15일
Hi,
I understand you are trying to find a metric to measure your model performance.
K-Fold: Usually, we split the data set into training and testing sets and use the training set to train the model and testing set to test the model. We then evaluate the model performance based on an error metric to determine the accuracy of the model. This method however, is not very reliable as the accuracy obtained for one test set can be very different to the accuracy obtained for a different test set. K-fold Cross Validation(CV) provides a solution to this problem by dividing the data into k folds and ensuring that each fold is used as a testing set and others for training at some point. It can further help you in determine the fit of the model/learner.
Leave one out: Leave-one-out cross validation is K-fold cross validation taken to its logical extreme, with K equal to N, the number of data points in the set.
So ,Yes whatever you are doing is correct as per your requirements.
For more information on the implementation of k-fold/stratified k-fold/leave one out, please visit the following link.
댓글 수: 3
Walter Roberson
2020년 7월 15일
data = cat(1, image_patches,labels);
That code is overwriting all of data each iteration.
It looks to me as if data will not be a vector, but I do not seem to be able to locate any hellopatches() function so I cannot tell what shape it will be. As you are not doing imresize() I also cannot be sure that all of the images are the same size, so I cannot be sure that data will be the same size for each iteration. Under the circumstances you should be considering saving into a cell array.
Note: please do not post the same query multiple times. I found at least 7 copies of your query :(
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Classification Learner App에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!