Crossvalidation: anonymous function handle with toolbox classifiers

조회 수: 9 (최근 30일)
Cristobal
Cristobal 2012년 4월 26일
Hi everyone,
I'll like to use the matlab crossvalidation function (crossval) with a randomforest classification toolbox (specifically http://code.google.com/p/randomforest-matlab/). As the predfun is defined in the documentation ( http://www.mathworks.com/help/toolbox/stats/crossval.html) I should give a function that retrieves the predictions for a set of test data XTEST. So, in agreement with the syntax, I should give a function like this:
classf= @(XTRAIN,ytrain,XTEST) classRF_predict(XTEST,classRF_train(XTRAIN,ytrain,1000));
such function takes as input the XTEST, the model itself that needs XTRAIN and ytrain. The problem comes when I try to run the cross validation, getting the follow error message.
cvMCR = crossval('mcr',X,y,'predfun',classf)
Error using crossval>evalFun (line 465)
The function
'@(XTRAIN,ytrain,XTEST)classRF_predict(XTEST,classRF_train(XTRAIN,ytrain,1000))'
generated the following error:
Cannot concatenate a double array and a nominal array.
Error in crossval>getLossVal (line 502)
funResult = evalFun(funorStr,arg(1:end-1));
Error in crossval (line 401)
[funResult,outarg] = getLossVal(i, nData, cvp, data,
predfun);
I'll really appreciate help.
Regards!

답변 (4개)

Ilya
Ilya 2012년 4월 26일
I think you've hit a bug in the crossval function. My guess is that classRF_predict returns numeric labels, and crossval does not process them correctly for the 'mcr' criterion. The workaround is to convert class labels returned by classRF_predict to the nominal type:
classf= @(XTRAIN,ytrain,XTEST) nominal(classRF_predict(XTEST,classRF_train(XTRAIN,ytrain,1000)));
and execute the call to crossval in the same way as before
cvMCR = crossval('mcr',X,y,'predfun',classf)
Alternatively, you could use the other signature for crossval
vals = crossval(fun,X,y)
and define
fun = @(Xtrain,Ytrain,Xtest,Ytest) mean(Ytest ~= classRF_predict(Xtest,classRF_train(Xtrain,Ytrain,1000)));
In this case, since you are comparing the true and predicted labels yourself, you can keep them numeric.
Let me know if either solution works for you.

Ilya
Ilya 2012년 4월 26일
I am not an expert on the randomforest-matlab package, so my advice could be off. I find two things in your post worth investigating:
  1. It is strange that you use Xtest as the 1st input to classRF_predict(XTEST,classRF_train(XTRAIN,ytrain,1000)). Usually it is the trained object that is the 1st argument.
  2. Make sure that the array of class labels, y, you pass to crossval has the same type as labels returned by classRF_predict.

Cristobal
Cristobal 2012년 4월 26일
Ilya,
Have you used this crossval with an external toolbox? If so, could you give me an example?
it's a bit strange because the original function follow
model = classRF_train(XTRAIN,ytrain);
yfit = classRF_predict(XTEST,model);
As you can see I just putted in the first line into the second.
And yes I'm sure that classRF_predict returns (yfit) the same domain that the ytrain.
I'm thinking that there should be something wrong about the first point, when I replaced one function into another, but I can't figure it out.
Regards

Cristobal
Cristobal 2012년 4월 26일
I think I'm on a trail here. I noticed that the problem where on the input parameters. The example from http://www.mathworks.com/help/toolbox/stats/crossval.html, works with classify function and the fisheriris data set, where the targetout are cell data type. Therefore I tried to cast the input of the anonymous function to double, int8, string... Looking the error message I saw that at some point I fails when comparing a nominal data with double data (which is not supported). The code line 410 inside crossval.m function make the proper comparison. In order to work I hijacked the original function
temploss = sum(outarg ~= funResult);
to
temploss = sum(double(outarg) ~= funResult);
forcing outarg variable to be double.
I really don't know if there's a simplest way to solve this problem. I think is not the best solution but it works.
  댓글 수: 1
Ilya
Ilya 2012년 4월 27일
Did you see my answer above?
You can do modify crossval if you'd like, but in that case do
temploss = sum(outarg ~= nominal(funResult));
That way you can continue using crossval with labels of all types. After what you did, you can only use crossval with handles that return labels of type double.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Gaussian Process Regression에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by