Exporting model to classify new data
이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
이전 댓글 표시
0 개 추천
Hi,
I have attached the code I use to classify my data. I use 16 different models. What I want to do is the following:
- I want to save/export the model sort of like the Classification Learner app does in order to make predictions on new data.
- I want to make a ROC curve with AUC results for each of the models
How can I do that?
채택된 답변
Ridwan Alam
2019년 12월 18일
1.Save: (assuming you want to save/export each classifier in separate files) use save().
2. ROC curve: use perfcurve() and plot() with hold on;
% Linear SVM
tic
classificationLinearSVM = fitcsvm(...
trainingData(train,1:end-1),...
trainingData(train,end), ...
'KernelFunction', 'linear', ...
'PolynomialOrder', [], ...
'KernelScale', 'auto', ...
'BoxConstraint', 1, ...
'Standardize', true, ...
'ClassNames', [0; 1]);
[predsLinSVM,~] = predict(classificationLinearSVM,trainingData(test,1:end-1));
targetLinSVM = trainingData(test,end);
targetsLinSVM_all = [targetsLinSVM_all; squeeze(targetLinSVM)];
predsLinSVM_all = [predsLinSVM_all; squeeze(predsLinSVM)];
t1 = toc;
save('classificationLinearSVM.mat','classificationLinearSVM','-v7.3');
% you need to declare the posclass
%
[~,scoresLinSVM] = resubPredict(fitPosterior(classificationLinearSVM));
[xLinSVM,yLinSVM,~,aucLinSVM] = perfcurve(trainingData(train,end),scoresLinSVM(:,2),posclass);
plot(xLinSVM,yLinSVM); hold on;
Hope this helps!
댓글 수: 9
Hi,
The code seems to work. However, fitPosterior only works for SVM. I also use k-Nearest Neighbor and Random Forest and this function will not work on those classifiers. Are there "fitPosterior" versions for these classifiers as well?
I get the following error:
Undefined function 'fitPosterior' for input arguments of type 'ClassificationKNN'.
I have tried by simply removing fitPosterior for the kNN and Random Forest classifiers and it seems to work, but I am not sure that it is correctly implemented.
Another things: when we use 'save' to save each of the trained classifiers, how do we use them to make predictions on a new data set (code wise)?
Ridwan Alam
2019년 12월 22일
편집: Ridwan Alam
2019년 12월 22일
Sure. You can find more details about using perfcurve() here:
After save, you can simply load those classifiers just like any variable, and use predict() or model.predictFcn() as you prefer. More details here:
Uerm
2019년 12월 23일
Cheers, it helped a lot! Is there a difference between saving the model in the for loop or after the loop? Will it make a difference? I have the save function of each model inside the loop now.
Ridwan Alam
2019년 12월 23일
Good question. That totally depends on the purpose of the loop. If the loop is supposed to help you to find the best performing model among these different types, you don't need to save the models in every iteration, but the models' performances only. After the loop, you compare those results, and find the best model. And retrain that certain kind and save. But if the purpose is different, and you want all intermediate models, you can save those inside the loop. Good luck!
Hi again,
I just want to save the "full" models which will be used to classify new data. It should do the same as the "Export Model" button when using the Classification Learner app.
Note: It seems to save it correctly. When I load the exported model and look at the predictors (features) and response (labels), the number of elements of these is approx 90% of the input data, which makes sense, since it trains on 90% of the input and tests on the last 10% (10-fold cross validation).
Another thing: The way I have used tic and toc... Will they only show the elapsed time for one intermediate result? I want the elapsed time for each individual classifier (full models).
As far as I remember, your loop iteration is the number of folds for the cross validation, right? In that case, if you put the save() command inside the loop, it will keep over-writing every iteration, and at the end, you will only have the model (of each kind e.g svm, random forest, etc.) trained during the last iteration. Now, that would be the same if you use the save() outside the loop, since you are using same model names for each iteration. Hope this makes sense.
About tic-toc: if you want to see the amount of time it takes to train each model, put the toc before the predict() part. Otherwise, you are getting time difference including time to predict and squeeze and so on.
Uerm
2020년 1월 6일
Ok, it makes total sense. Regarding the save part... Does it mean that I have to have 10 save commands for each model since it keeps over-writing?
Ridwan Alam
2020년 1월 6일
편집: Ridwan Alam
2020년 1월 6일
Say, for the SVM models, if you really want to save the 10 SVM models from each iteration, you can either give them a new name in each iteration (eg mySvm_1, mySvm_2, ...) and save all of them after exiting the loop. But, again, I don't think that's very common to save the intermediate models from all the iterations of the cross-validation. Good luck.
Btw, if you liked the conversation, please vote up the response. Thanks!
Uerm
2020년 1월 10일
Hi Ridwan,
Thanks a lot, I voted up the response!
I have run into another problem (I have attached the code). When I plot the confusion matrix and ROC curve, it seems that the results from the training and validation are combined into one. What I mean by this is that for instance in the confusion matrix, when the numbers in the matrix is summed, it is exactly equal to all the samples (training samples + validation samples). I want to have two confusion matrices (and two ROC curves and thus 2 AUC values) for every model --> One for the training and one for the validation. Is that possible?
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Statistics and Machine Learning Toolbox에 대해 자세히 알아보기
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
