In the following code, Why is the classification accuracy (acc1, acc2) calculated differently?

Question

Y. K. 2019년 11월 7일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/489717-in-the-following-code-why-is-the-classification-accuracy-acc1-acc2-calculated-differently

편집: michio 2019년 11월 18일

The following function (knn_test) takes the following parameters: X indicates dataset sampels, Y indicates dataset labels, and filterindex corresponds column filter. If I want to filter the data, filterIndex is set to 1, whichever column is to be filtered.

I want to validate two holdout method. One with crossval function, and the other with cvpartition function. But when I call this method, acc1 and acc2 variables show different values.

I added break point to the code and debugged the code. I examined CVKNNModels' partition property and it was the same with the c particion in the Model 2.

What could have gone wrong with the following code?

Why did these two accuracy variables take different values?

If I want to use this function for holdout classification, which model should I choose?

Thanks.

function [acc1,acc2]=knn_test(X,Y,filterIndex)
columnfilterIndex = find(filterIndex==1);
%Model 1
tra1Data = X(:,[columnfilterIndex]);
tra1Label=Y;
KNNModel1=fitcknn(tra1Data, tra1Label, 'Distance', 'Euclidean', 'NumNeighbors', 3, 'DistanceWeight', 'Equal', 'Standardize', true);
rng('default');
CVKNNModel = crossval(KNNModel1,'holdout',0.3);
loss=kfoldLoss(CVKNNModel);
acc1=1-loss;
%Model 2
rng('default');
c = cvpartition(Y,'HoldOut',0.3);
tra2Data=X(c.training,[columnfilterIndex]);
tra2Label=Y(c.training,:);
test2Data=X(c.test,[columnfilterIndex]);
test2Label=Y(c.test,:);
KNNModel2 = fitcknn(tra2Data,tra2Label,'Distance', 'Euclidean','NumNeighbors',3, 'DistanceWeight', 'Equal','Standardize', true);
pre_test = predict(KNNModel2,test2Data);
correctPredictions = (pre_test == test2Label);
acc2 = sum(correctPredictions)/length(correctPredictions);
%perf=classperf(uint8(test2Label),uint8(pre_test)); 
%acc2=perf.CorrectRate;

댓글 수: 2
없음 표시없음 숨기기

michio 2019년 11월 12일

MATLAB Online에서 열기

Could you provide a script that can reproduce the issue? I run the following and acc1 and acc2 are the same.

load ionosphere
[acc1,acc2]=knn_test(X,Y,ones(size(X,2),1))

where (note the line: correctPredictions = (string(pre_test) == string(test2Label)); to avoid error)

function [acc1,acc2]=knn_test(X,Y,filterIndex)
columnfilterIndex = find(filterIndex==1);
%Model 1
tra1Data = X(:,[columnfilterIndex]);
tra1Label=Y;
KNNModel1=fitcknn(tra1Data, tra1Label, 'Distance', 'Euclidean', 'NumNeighbors', 3, 'DistanceWeight', 'Equal', 'Standardize', true);
rng('default');
CVKNNModel = crossval(KNNModel1,'holdout',0.3);
loss=kfoldLoss(CVKNNModel);
acc1=1-loss;
%Model 2
rng('default');
c = cvpartition(Y,'HoldOut',0.3);
tra2Data=X(c.training,[columnfilterIndex]);
tra2Label=Y(c.training,:);
test2Data=X(c.test,[columnfilterIndex]);
test2Label=Y(c.test,:);
KNNModel2 = fitcknn(tra2Data,tra2Label,'Distance', 'Euclidean','NumNeighbors',3, 'DistanceWeight', 'Equal','Standardize', true);
pre_test = predict(KNNModel2,test2Data);
% correctPredictions = (pre_test == test2Label);
correctPredictions = (string(pre_test) == string(test2Label));
acc2 = sum(correctPredictions)/length(correctPredictions);
%perf=classperf(uint8(test2Label),uint8(pre_test)); 
%acc2=perf.CorrectRate;

Y. K. 2019년 11월 13일

편집: Y. K. 2019년 11월 14일

I couldn't add test data. Because it size exceeds 5mb. I send the data to you with e-mail.

Thanks for your interest.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

michio 2019년 11월 18일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/489717-in-the-following-code-why-is-the-classification-accuracy-acc1-acc2-calculated-differently#answer_401965

편집: michio 2019년 11월 18일

MATLAB Online에서 열기

The two ways of hold-out cross-validation that you described have some subtle differences. For Model 1, when calling the crossval method on the KNNModel1, the prior is based on the whole dataset. For Model 2, the prior is based on the training partition tra2Data. If you specify the same prior, you should get the same results.

KNNModel2 = fitcknn(tra2Data,tra2Label,...
    'Distance', 'Euclidean','NumNeighbors',3, ...
    'DistanceWeight', 'Equal','Standardize', true, 'Prior', KNNModel1.Prior);

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 2

Y. K. 2019년 11월 12일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/489717-in-the-following-code-why-is-the-classification-accuracy-acc1-acc2-calculated-differently#answer_401077

In addition, this code gave the same results for small datasets. But when the number of features increases, it produces different results, especially in my dataset.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

In the following code, Why is the classification accuracy (acc1, acc2) calculated differently?

댓글 수: 2
없음 표시없음 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

In the following code, Why is the classification accuracy (acc1, acc2) calculated differently?

댓글 수: 2 없음 표시없음 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기