How to return 'X' number of unique subsets (combinations) of 'N' numbers taken 'K' at a time

조회 수: 4 (최근 30일)
I need to return X number of unique combinations of N numbers (i.e., vector V of length N) taken K at a time.
I can't use 'nchoosek' because I don't want ALL unique combinations. I just want X number of them and 'nchoosek' will crash if I enter the actual values for V and K because V is too large.
Here's an example, with more descriptive variable names…
origSet = rand(1,500); %the full original (example) set of numbers
desNumComb = 10000; %the number of unique combinations/subsets that I want to end up with
subsetSize = 10; %the desired size for each combination/subset
allCombos = nchoosek(1:length(origSet), subsetSize); %will return ALL possible combinations (if it ran)
subsetInds = allCombos(desNumComb,:); %the indices for each of the desNumComb subsets
Worth mentioning is that the size of the original set of numbers [i.e., length(origSet) ], the desired subset size [i.e., subsetSize], and the desired number of unique combinations [i.e., desNumComb] will possibly vary every time I loop through, which will be many times.
Thanks in advance to all.
Cheers, John
  댓글 수: 2
Walter Roberson
Walter Roberson 2015년 7월 23일
Which X subsets? The "first" X subsets under some specific ordering? X random subsets? Are you using this to iterate through all the possibilities in batches?
John Trimper
John Trimper 2015년 7월 24일
Hi Walter,
It doesn't matter which X subsets out of the full range of unique possibilities. What matters is that they're all unique.
Here's what I'm doing: I need to compare two groups but they have really different numbers of samples. One group has up to several hundred, while the other group might have as few as 5. The metric I'm using is biased so I need to equate the number of samples in each group. So what I want to do is repeatedly subsample the larger group down to match the number of samples in the smaller group, up to 10,000 times (but not more) and then average over the measurements taken across those 10,000 subsamples. Since the total number of unique combinations is WAY more than I need (incomputable by nchoosek), I need to find a way to only get a reduced chosen number of unique combinations.
I hope that helps to clarify. Thank you for your time.

댓글을 달려면 로그인하십시오.

채택된 답변

John Trimper
John Trimper 2015년 7월 27일
편집: Walter Roberson 2015년 7월 27일
Answer provided by Star Strider & Walter Roberson above, worked out in comments, summarized here:
Use randperm to generate more vectors than necessary, then use unique(A, 'rows', 'stable') to select only unique combinations.
Example code for those interested:
biggerGroup = rand(1,100);
subsetSize = 10;
mixer = zeros(1, length(biggerGroup));
mixer(1:subsetSize) = 1;
for s = 1:20000; %more shuffles than I actually need
mixer = mixer(randperm(length(mixer)));
allCombs(s,:) = biggerGroup(mixer==1);
end
uniqueShufs = unique(allCombs, 'rows', 'stable');
desNumUniShuf = 10000; %actual desired # of unique shuffles
myUniShufs = uniqueShufs(1:desNumUniShuf,:);

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Creating and Concatenating Matrices에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by