find unique sets of values in matrix, eliminate duplications

Question

Dean Ranmar 2018년 10월 18일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/424717-find-unique-sets-of-values-in-matrix-eliminate-duplications

댓글: Bruno Luong 2018년 10월 21일

I had a lot of trouble wording this Q succinctly and accurately. I have large matrices with a fixed # of rows but varying #s of columns. I want to identify which columns have identical sets of values in a small subset of rows, then eliminate all but one of the like columns using a criteria applied to values in another row, not part of the aforementioned subset [of rows.] For example, I have a matrix with 20 rows and N columns (20 x N). I want to identify the unique combinations of values in, say, rows 5 & 6, then save only the column that has the maximum value in row 3 from the subset having identical values in rows 5 & 6.

A=[8     7     4     8     4     2     1     9     8     6
   0     4     3     8     8    10     6     4     3
   9     8     5     6     3     0     4     2     7
   9     8     7     6     5     8     5     4     2
   7     2     9     9     2     8     4     1     7
   8     5    10     3     6     9     1     1     8
   7     5     6     8     3     1     2     9     4
   4     6     1     8     7     4     1    10     6
   7     7     2     4     7     3     2     6     8
   2     8     3     6     8     8     2     1     1
   7     3     8     1     5     4     4     2     9
   0     7     3     1     1     9     1     4     8
   3     7     8     5     2     2     9     8     5
   1     2     3     8     9     3     9     0     4
   1     1     9     9     2     2     5     1     5
   8     5     4     1     8     1     5     2     3
   7    10     2     6     5     9     3     7     5
   3     3     3     5    10     6     9     7     5
  10     6     6     0     1     6     4     7     8
   0     2     5     3     4     2     1     5     8];

In the above example, columns 2 & 10 have the same pair of values in rows 5 & 6: (7 & 8). I then want to eliminate the column(s) with the smaller value for row 3. In this case column 2 has the value 9 & column 10 has the value 7 so, I want to set A = A(:,1:9) or A(:,10) = []. I have tried using the unique function to identify pairs (sets, in general) of identical values [after transposing the matrix so I can work on rows] but I must not be using it properly. I assume I will use sortrows to sort the subsets in descending order (assuming my criteria is to save only the column with the max value in another row) - either before or after identifying like columns - and drop all but the max value-column.

댓글 수: 2
없음 표시없음 숨기기

madhan ravi 2018년 10월 18일

give an example of your expected output so thats its easy to understand

Dean Ranmar 2018년 10월 18일

I should have! Thanks - I got answers; see below.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Andrei Bobrov 2018년 10월 18일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/424717-find-unique-sets-of-values-in-matrix-eliminate-duplications#answer_342083

MATLAB Online에서 열기

r = [3;5;6];
B = A(r,:)';
[~,~,c] = unique(B(:,2:3),'rows','stable');
ii = find(histcounts(c,1:max(c)+1) > 1);
[lo,jj] = ismember(c,ii);
A(:,cell2mat(accumarray(jj(lo),find(lo),[],@(x){x(min(B(x,1))==B(x,1))}))) = [];

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

Dean Ranmar 2018년 10월 18일

편집: Dean Ranmar 2018년 10월 18일

OK. I think the function you define always drops only the minimum value? I edited that last line to look like this:

A(:,cell2mat(accumarray(jj(lo),find(lo),[],@(x){x(max(B(x,1))~=B(x,1))}))) = [];

and it seems to work. I will test it more thoroughly. Thanks you SO much; this helped me a lot and saved a lot of time! I will test it more thoroughly.

Dean Ranmar 2018년 10월 18일

편집: Dean Ranmar 2018년 10월 18일

Stephen Cobeldick: Yes, I know. I sneaked it in by displaying the sample values in the above comment. Sorry to you and Andrei Bobrov. (I was lazy in creating the sample array - I created a random array and it was easier to use all integers in order for at least one pair to match.) It doesn't matter but, that wasn't the problem, BTW.

댓글을 달려면 로그인하십시오.

Answer 2

dpb 2018년 10월 18일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/424717-find-unique-sets-of-values-in-matrix-eliminate-duplications#answer_342090

편집: dpb 2018년 10월 18일

MATLAB Online에서 열기

Well, until the last step...you lost me there as to what you actually want as final result but can at least identify who you're looking for...

Your first inclination with unique is good:

ir1=5:6;                                    % define the first rows grouping
ir2=3;                                      % the alternate other row
[u,ia,ib]=unique(A(ir1,:).','rows');        % find the combinations in 1st subset
if numel(u)==numel(ib), return, end;        % weren't any matches, leave
n=histc(ib,1:length(ia));                   % count occurrences
ic=find(ib==find(n>1));                     % find the columns that match (in A)
[~,imn]=min(A(ir2,ic));                     % and the index to the minimum of those columns
icmn=ic(imn);                               % the column in A of min A(ir2,ic)

Alternatively, you can find which is the largest of those simply by replacing MIN() w/ MAX() (and an appropriate change in return variable name, of course).

If I read the request correctly, since there's no guarantee the location you want to eliminate is the last column, then the expression 1:9 you used above wouldn't be general so the easier coding would be using MAX()

[~,imx]=max(A(ir2,ic));
A(:,ic(imx))=[];

would, I think, be the requested result.

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

Dean Ranmar 2018년 10월 21일

I agree with your reasoning and I see that I have used histc in the past with no issues; this is actually the first time I've seen histcounts. I am testing the two approaches today, but with histc only. This is necessary because throughput is an issue in my program.

Bruno Luong 2018년 10월 21일

This shows again a wrong decision of TMW on the design.

HISTC returns a consistent binning rule (left <= value < right) for all the bin, whereas HISTCOUNTS make an exception for the last bin (left <= value <= right).

When exception like this occurs, it implies all kind of exception code to handle it.

댓글을 달려면 로그인하십시오.

find unique sets of values in matrix, eliminate duplications

댓글 수: 2
없음 표시없음 숨기기

채택된 답변

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

추가 답변 (1개)

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

find unique sets of values in matrix, eliminate duplications

댓글 수: 2 없음 표시없음 숨기기

채택된 답변

댓글 수: 5 이전 댓글 3개 표시이전 댓글 3개 숨기기

추가 답변 (1개)

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기