Classifier not working properly on test set

조회 수: 2 (최근 30일)
Warid Islam
Warid Islam 2020년 7월 14일
답변: Nipun Katyal 2020년 7월 21일
Hello,
I have trained a SVM classifier on a breast cancer feature set. I get a validation accuracy of 83% on the training set but the accuracy is very poor on the test set. The data set has 1999 observations and 9 features.The training set to test set ratio is 0.6:0.4. Any suggestions would be very much appreciated. Thank you.
X_train=table2array(new7(1:1200,1:9));
y_train=table2array(new7(1:1200,10));
X_test=table2array(new7(1201:1999,1:9));
y_test=table2array(new7(1201:1999,10));
Mdl = fitcsvm(...
X_train, ...
y_train, ...
'KernelFunction','rbf',...
'OptimizeHyperparameters','auto',...
'HyperparameterOptimizationOptions',...
struct('AcquisitionFunctionName',...
'expected-improvement-plus'));
% Perform cross-validation
partitionedModel = crossval(Mdl, 'KFold', 10);
% Compute validation predictions
[validationPredictions, validationScores] = kfoldPredict(partitionedModel);
% Compute validation accuracy
validation_error = kfoldLoss(partitionedModel, 'LossFun', 'ClassifError'); % validation error
validationAccuracy = 1 - validation_error;
%% test model
oofLabel_n = predict(Mdl,X_test);
oofLabel_n = double(oofLabel_n); % chuyen tu categorical sang dang double
test_accuracy_for_iter = sum((oofLabel_n == (y_test)))/length(y_test)*100;
%% save model
saveCompactModel(Mdl,'mySVM');

답변 (1개)

Nipun Katyal
Nipun Katyal 2020년 7월 21일
clc
clear all
rawData = xlsread('new7.xlsx');
[m,n] = size(rawData);
new7 = rawData(randperm(m),:);
X_train=new7(1:1200,1:9);
y_train=new7(1:1200,10);
X_test=new7(1201:1999,1:9);
y_test=new7(1201:1999,10);
Mdl = fitcsvm(...
X_train, ...
y_train, ...
'KernelFunction','rbf',...
'OptimizeHyperparameters','auto',...
'HyperparameterOptimizationOptions',...
struct('AcquisitionFunctionName',...
'expected-improvement-plus'));
% Perform cross-validation
partitionedModel = crossval(Mdl, 'KFold', 10);
% Compute validation predictions
[validationPredictions, validationScores] = kfoldPredict(partitionedModel);
% Compute validation accuracy
validation_error = kfoldLoss(partitionedModel, 'LossFun', 'ClassifError'); % validation error
validationAccuracy = 1 - validation_error;
%% test model
oofLabel_n = predict(Mdl,X_test);
oofLabel_n = double(oofLabel_n); % chuyen tu categorical sang dang double
test_accuracy_for_iter = sum((oofLabel_n == (y_test)))/length(y_test)*100
%% save model
saveCompactModel(Mdl,'mySVM');
On observing your data set you will find that the labels for class 1 and 2 are clubbed together which leaves a discrepancy in the validation set and test set in which the validation set contained a majority of class 1 and some class 2 features while you test set had the opposite which resulted in poor accuracy, so before splitting the data into validation and test you should jumble the rows, so as to provide an equal distribution of classes across the two sets.

카테고리

Help CenterFile Exchange에서 AI for Signals and Images에 대해 자세히 알아보기

제품


릴리스

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by