필터 지우기
필터 지우기

Matlab find unique column-combinations in matrix and respective index

조회 수: 86 (최근 30일)
Benvaulter
Benvaulter 2017년 3월 22일
편집: Jan 2017년 3월 23일
I have a large matrix with with multiple rows and a limited (but larger than 1) number of columns containing values between 0 and 9 and would like to find an efficient way to identify unique row-wise combinations and their indices to then build sums (somehwat like a pivot logic). Here is an example of what I am trying to achieve:
a =
1 2 3
2 2 3
3 2 1
1 2 3
3 2 1
uniqueCombs =
1 2 3
2 2 3
3 2 1
numOccurrences =
2
1
2
indizies:
[1;4]
[2]
[3;5]
From matrix a, I want to first identify the unique combinations (row-wise), then count the number occurrences / identify the row-index of the respective combination.
I have achieved this through generating strings with num2str and strcat, but this method appears to be very slow. Along these thoughts I have tried to find a way to form a new unique number through concatenating the values horizontally, but Matlab does not seem to support this (e.g. from [1;2;3] build 123). Sums won't work because they would remove the possibility to identify unique combinations. Any suggestions on how to best achieve this? Thanks!

채택된 답변

Guillaume
Guillaume 2017년 3월 22일
More or less the same as Jan's, using accumarray instead of splitapply (I'm still old school!):
A = [ 1 2 3
2 2 3
3 2 1
1 2 3
3 2 1];
[B, ~, ib] = unique(A, 'rows');
numoccurences = accumarray(ib, 1);
indices = accumarray(ib, find(ib), [], @(rows){rows}); %the find(ib) simply generates (1:size(a,1))'
  댓글 수: 4
Guillaume
Guillaume 2017년 3월 23일
편집: Guillaume 2017년 3월 23일
I suspect that accumarray will be faster as it is built-in compiled code whereas splitapply is m code, but I haven't conducted any test.
Note: for the indices,
indices = accumarray(ib, (1:numel(ib))', [], @(rows){rows});
is probably slightly faster, just not as concise.
Jan
Jan 2017년 3월 23일
편집: Jan 2017년 3월 23일
@Guillaume: I compare this with cellfun: In older versions Matlab contained the C-sources for this Mex function. Here calling a function handle is very expensive, because the Matlab tier has to be called. Therefore the implicitely defined methods provided by strings are much faster: 'length', 'isclass' etc.
Then using a compiled Mex function is not a real benefit, because mexCallMATLAB has some overhead. This might concern accumarray also. I guess that your accumarray approach is faster than the loop, but I know that it looks very cryptic ;-)
But now I can leave the speculations and run a test: With
A = randi([1, 100], 1e5, 3); % Test data
my loop takes 14.75 seconds, your accumarray approach takes 0.44 seconds. The results differ in the order of the indices. So perhaps this is wanted:
[B, iB, iA] = unique(A, 'rows');
indices = accumarray(iA, (1:numel(iA)).', [], @(r){sort(r)});
The result is clear: @Benvaulter, please unaccept my answer and select Guillaume's, and of course use it also to save time and energy.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Jan
Jan 2017년 3월 22일
편집: Jan 2017년 3월 23일
A = [ 1 2 3; ...
2 2 3; ...
3 2 1; ...
1 2 3; ...
3 2 1];
[B, iB, iA] = unique(A, 'rows');
G = unique(iA);
numOccurrences = splitapply(@sum, iA, G);
I cannot test a method to obtain the indices list as wanted. I assume this works with splitapply also. A simple loop approach at least:
n = length(G);
indices = cell(1, n);
for k = 1:n
indices{k} = find(iA == G(k));
end
[EDITED] Code is tested now. Use the much faster solution of Guillaume for productive work.

카테고리

Help CenterFile Exchange에서 Matrix Indexing에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by