Why different runs of sequentialfs give different list of features

Question

Gaurav Thareja 2015년 7월 27일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/231305-why-different-runs-of-sequentialfs-give-different-list-of-features

댓글: Mango Wang 2019년 6월 29일

I am running sequentialfs for feature selection. In different runs it provide different list of selected features. I have 11 features in the data set. Shall I run sequentialfs multiple times and include features which are included in 75% of runs?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Walter Roberson 2015년 7월 27일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/231305-why-different-runs-of-sequentialfs-give-different-list-of-features#answer_187293

In the documentation for sequentialfs notice 'mcreps', the number of Monte-Carlo attempts. Monte-Carlo _always* implies randomness.

Now look at 'options' and see UseSubstreams and Streams, which talk about which random number generator to use. UseSubstreams is only meaningful if parallel processing is turned on, but Streams is used either way. Clearly randomness is part of the calculation.

If you want consistent output, initialize your random number generator.

Try increasing mcreps to have it automatically run multiple times.

댓글 수: 2
없음 표시없음 숨기기

Melissa McCoy 2015년 8월 16일

편집: Melissa McCoy 2015년 8월 16일

MATLAB Online에서 열기

Thanks for providing this info!

When you say initialize your random generator, do you mean something like this below. And I've increased mcreps to 5. But I'm still getting different features returned (it's looking through 345 features of 448 data entries - note the features are dummy variables from ~70 features with 4-5 categories in each).

      c = cvpartition(dumY(:,1),'k',5);
      stream = RandStream('mrg32k3a','Seed',5489);
      opts = statset('display','iter','Streams',stream);
      inmodel = sequentialfs(@my_fun_lib,XMat,dumY(:,1),'cv',c,'mcreps',5,'options',opts);

Can you advise on my error or ways to solve the issue? My features do have a quite a bit of missing not at random data which I've handled with adding an extra "Unsure" category to each and am not sure if this could be causing the issue.

Many thanks!

Mango Wang 2019년 6월 29일

The asker may already not care about the answer. But I just put my thoughts here for following asker.

It's not Monte Carlo that determines the different result but the way you use cross validation does. Namely, cross validation has randomness and leads to different result. Because you call cvpartition before RandStream, namely, you initialize the random generator after already create cv object, the sequentialfs return different results. especially the case that your code is inside a function.

One way to avoid it is feed c as a numeric number rather than cvpartition object without the need to initialize the stream.

another way is to increase mcreps which will do a monte carlo repartition based on cvpartition.

But I guess a better way is to let it run thousands of times to get a really robust feature subset.

댓글을 달려면 로그인하십시오.

Why different runs of sequentialfs give different list of features

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 2
없음 표시없음 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Why different runs of sequentialfs give different list of features

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 2 없음 표시없음 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기