SVM: How is the classification error with leave-one-out cross validation calculated?

Question

David Schubert 2015년 7월 20일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/230496-svm-how-is-the-classification-error-with-leave-one-out-cross-validation-calculated

댓글: Ilya 2015년 7월 21일

I am trying to understand what matlab's leave-one-out cross validation of an SVM is doing by comparing it to a leave-one-out cross validation written myself. Unfortunately, I do not get the same results.

First I create some random data

rng(0);
X = rand(10,20);
Y = [ones(5,1); zeros(5,1)];
n_samples = size(X,1);

Then I calculate the classification error with leave-one-out cross validation

CVSVMModel = fitcsvm(X, Y,...
    'KernelFunction','rbf',...
    'BoxConstraint',1,...
    'LeaveOut', 'on',...
    'CacheSize', 'maximal');
error1 = kfoldLoss(CVSVMModel, 'lossfun', 'classiferror')

Now I try to do the same by hand. For each iteration, one sample is taken out of the training set and predicted afterwards.

error2 = 0;
for fold = 1:n_samples
    idx = [1:(fold-1), (fold+1):n_samples];
    SVMModel = fitcsvm(X(idx,:), Y(idx),...
        'KernelFunction','rbf',...
        'BoxConstraint',1,...
        'CacheSize', 'maximal');
    label = predict(SVMModel, X(fold,:));
    error2 = error2 + (label~=Y(fold));
end      
error2 = error2/n_samples

However, I do not get the same results:

error1 =
      0.7000
error2 =
       1

Can anyone tell me why?

What also worries me: Why does the second method perfectly misclassify every point (error2=1)? This can't be a coincidence.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Ilya 2015년 7월 20일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/230496-svm-how-is-the-classification-error-with-leave-one-out-cross-validation-calculated#answer_186632

fitcsvm passes class prior probabilities found from the entire data into each fold. Look at CVSVMModel.Trained{1}.Prior, CVSVMModel.Trained{2}.Prior etc - every time you should see [0.5 0.5]. When you cross-validate yourself, the priors are derived for each fold independently, and in each case you should have 5/9 for one class and 4/9 for the other. This explains the difference.

As to why the 2nd method errors 100% of the time, see my answer here. The short answer is - because the left-out label is always opposite to the majority class in the training set.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

David Schubert 2015년 7월 21일

편집: David Schubert 2015년 7월 21일

Sorry, one more question: How do the prior probabilities affect the prediction? Here I just find that class(z)=sign(<w,z>+b), but I don't see how this takes the prior probabilities into account.

Ilya 2015년 7월 21일

Instead of using one box constraint for all observations, fitcsvm sets individual box constraints to C*N*wn, where C is what you pass, N is the total number of observations, and wn is the observation weight proportional to the class probability. In your case, each fold has 5 observations of one class weighted at 0.1 and 4 observations of the other class weighted at 0.125. Observations of the minority class can have larger alpha coefficients. This is why the model sometimes predicts into the minority class. Whereas in your manual cross-validation, all observations have equal weights and the model always predicts into the majority class.

댓글을 달려면 로그인하십시오.

SVM: How is the classification error with leave-one-out cross validation calculated?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

SVM: How is the classification error with leave-one-out cross validation calculated?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기