Why different runs of sequentialfs give different list of features

조회 수: 1 (최근 30일)
Gaurav Thareja
Gaurav Thareja 2015년 7월 27일
댓글: Mango Wang 2019년 6월 29일
I am running sequentialfs for feature selection. In different runs it provide different list of selected features. I have 11 features in the data set. Shall I run sequentialfs multiple times and include features which are included in 75% of runs?

답변 (1개)

Walter Roberson
Walter Roberson 2015년 7월 27일
In the documentation for sequentialfs notice 'mcreps', the number of Monte-Carlo attempts. Monte-Carlo _always* implies randomness.
Now look at 'options' and see UseSubstreams and Streams, which talk about which random number generator to use. UseSubstreams is only meaningful if parallel processing is turned on, but Streams is used either way. Clearly randomness is part of the calculation.
If you want consistent output, initialize your random number generator.
Try increasing mcreps to have it automatically run multiple times.
  댓글 수: 2
Melissa McCoy
Melissa McCoy 2015년 8월 16일
편집: Melissa McCoy 2015년 8월 16일
Thanks for providing this info!
When you say initialize your random generator, do you mean something like this below. And I've increased mcreps to 5. But I'm still getting different features returned (it's looking through 345 features of 448 data entries - note the features are dummy variables from ~70 features with 4-5 categories in each).
c = cvpartition(dumY(:,1),'k',5);
stream = RandStream('mrg32k3a','Seed',5489);
opts = statset('display','iter','Streams',stream);
inmodel = sequentialfs(@my_fun_lib,XMat,dumY(:,1),'cv',c,'mcreps',5,'options',opts);
Can you advise on my error or ways to solve the issue? My features do have a quite a bit of missing not at random data which I've handled with adding an extra "Unsure" category to each and am not sure if this could be causing the issue.
Many thanks!
Mango Wang
Mango Wang 2019년 6월 29일
The asker may already not care about the answer. But I just put my thoughts here for following asker.
It's not Monte Carlo that determines the different result but the way you use cross validation does. Namely, cross validation has randomness and leads to different result. Because you call cvpartition before RandStream, namely, you initialize the random generator after already create cv object, the sequentialfs return different results. especially the case that your code is inside a function.
One way to avoid it is feed c as a numeric number rather than cvpartition object without the need to initialize the stream.
another way is to increase mcreps which will do a monte carlo repartition based on cvpartition.
But I guess a better way is to let it run thousands of times to get a really robust feature subset.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Design Condition Indicators Interactively에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by