Classifier not working properly on test set

Question

Warid Islam 2020년 7월 14일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/564740-classifier-not-working-properly-on-test-set

답변: Nipun Katyal 2020년 7월 21일

new7.xlsx

Hello,

I have trained a SVM classifier on a breast cancer feature set. I get a validation accuracy of 83% on the training set but the accuracy is very poor on the test set. The data set has 1999 observations and 9 features.The training set to test set ratio is 0.6:0.4. Any suggestions would be very much appreciated. Thank you.

X_train=table2array(new7(1:1200,1:9));
y_train=table2array(new7(1:1200,10));
X_test=table2array(new7(1201:1999,1:9));
y_test=table2array(new7(1201:1999,10));
    
Mdl = fitcsvm(...
    X_train, ...
    y_train, ...
    'KernelFunction','rbf',...
    'OptimizeHyperparameters','auto',...
    'HyperparameterOptimizationOptions',...
    struct('AcquisitionFunctionName',...
    'expected-improvement-plus'));
% Perform cross-validation
partitionedModel = crossval(Mdl, 'KFold', 10);
% Compute validation predictions
[validationPredictions, validationScores] = kfoldPredict(partitionedModel);
% Compute validation accuracy
validation_error = kfoldLoss(partitionedModel, 'LossFun', 'ClassifError'); % validation error
validationAccuracy = 1 - validation_error;
%% test model
oofLabel_n = predict(Mdl,X_test);
oofLabel_n = double(oofLabel_n); % chuyen tu categorical sang dang double
test_accuracy_for_iter = sum((oofLabel_n == (y_test)))/length(y_test)*100;
%% save model
saveCompactModel(Mdl,'mySVM');

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Nipun Katyal 2020년 7월 21일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/564740-classifier-not-working-properly-on-test-set#answer_468813

MATLAB Online에서 열기

  clc
clear all
rawData = xlsread('new7.xlsx');
[m,n] = size(rawData);
new7 = rawData(randperm(m),:);
X_train=new7(1:1200,1:9);
y_train=new7(1:1200,10);
X_test=new7(1201:1999,1:9);
y_test=new7(1201:1999,10);
    
Mdl = fitcsvm(...
    X_train, ...
    y_train, ...
    'KernelFunction','rbf',...
    'OptimizeHyperparameters','auto',...
    'HyperparameterOptimizationOptions',...
    struct('AcquisitionFunctionName',...
    'expected-improvement-plus'));
% Perform cross-validation
partitionedModel = crossval(Mdl, 'KFold', 10);
% Compute validation predictions
[validationPredictions, validationScores] = kfoldPredict(partitionedModel);
% Compute validation accuracy
validation_error = kfoldLoss(partitionedModel, 'LossFun', 'ClassifError'); % validation error
validationAccuracy = 1 - validation_error;
%% test model
oofLabel_n = predict(Mdl,X_test);
oofLabel_n = double(oofLabel_n); % chuyen tu categorical sang dang double
test_accuracy_for_iter = sum((oofLabel_n == (y_test)))/length(y_test)*100
%% save model
saveCompactModel(Mdl,'mySVM');

On observing your data set you will find that the labels for class 1 and 2 are clubbed together which leaves a discrepancy in the validation set and test set in which the validation set contained a majority of class 1 and some class 2 features while you test set had the opposite which resulted in poor accuracy, so before splitting the data into validation and test you should jumble the rows, so as to provide an equal distribution of classes across the two sets.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Classifier not working properly on test set

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Classifier not working properly on test set

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기