How to use KNN to classify data in MATLAB?

조회 수: 9 (최근 30일)
Pedro
Pedro 2014년 7월 11일
댓글: merlin toche 2022년 12월 2일
I'm having problems in understanding how K-NN classification works in MATLAB.´ Here's the problem, I have a large dataset (65 features for over 1500 subjects) and its respective classes' label (0 or 1). According to what's been explained to me, I have to divide the data into training, test and validation subsets to perform supervised training on the data, and classify it via K-NN. First of all, what's the best ratio to divide the 3 subgroups (1/3 of the size of the dataset each?).
I've looked into ClassificationKNN/fitcknn functions, as well as the crossval function (idealy to divide data), but I'm really not sure how to use them.
To sum up, I wanted to - divide data into 3 groups - "train" the KNN (I know it's not a method that requires training, but the equivalent to training) with the training subset - classify the test subset and get it's classification error/performance - what's the point of having a validation test?
I hope you can help me, thank you in advance
  댓글 수: 1
merlin toche
merlin toche 2022년 12월 2일
Hi
please anyone can help me on how to create open circuit fault inverter using simulink (tutorial ,i need please)

댓글을 달려면 로그인하십시오.

답변 (1개)

Shashank Prasanna
Shashank Prasanna 2014년 7월 11일
You need a validation set if you want to tune certain parameters in the classifier. For example if you were to use SVM with rbf kernel, then you can choose the kernel parameters using validation. A model trained on the training data is tested on Test data to see how it performs on unseen data. If these concepts are not clear, I recommend reading some literature on this topic before proceeding.
The approach you mention is holdout. There are other way to do cross validation. For holdout, how much to divide the data is upto you and of course the nature of the dataset.
Take a look at how you can use cvpartition to try different cross validation techniques:
All the function have examples you can run directly.
  댓글 수: 4
Pedro
Pedro 2014년 7월 11일
I think I was able to do it, but, if that's not asking too much, could you see if I missed something? This is my code, for a random case:
nfeats=60;ninds=1000;
trainRatio=0.8;valRatio=.1;testRatio=.1;
kmax=100; %for instance...
data=randi(100,nfeats,ninds);
class=randi(2,1,ninds);
[trainInd,valInd,testInd] = dividerand(1000,trainRatio,valRatio,testRatio);
train=data(:,trainInd);
test=data(:,testInd);
val=data(:,valInd);
train_class=class(:,trainInd);
test_class=class(:,testInd);
val_class=class(:,valInd);
precisionmax=0;
koptimal=0;
for know=1:kmax
%is it the same thing use knnclassify or fitcknn+predict??
predicted_class = knnclassify(val', train', train_class',know);
mdl = fitcknn(train',train_class','NumNeighbors',know) ;
label = predict(mdl,val');
consistency=sum(label==val_class')/length(val_class);
if consistency>precisionmax
precisionmax=consistency;
koptimal=know;
end
end
mdl_final = fitcknn(train',train_class','NumNeighbors',know) ;
label_final = predict(mdl,test');
consistency_final=sum(label==test_class')/length(test_class);
Thank you very much for all your help, Shashank Prasanna.
Pedro
Pedro 2014년 7월 11일
I don't know if there's a better way of comparing the true test_label and the predicted label...

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by