Compare for uniqueness between 2 very large matrices

I have two matrices of the same size - 95 x 100,000.
They are in different orders but I would like to compare if columns in one matrix are repeated elsewhere in the other matrix - or if the 2 matrices are completely unique?

댓글 수: 2

You can use Lia = ismember(A,B,'rows'), which returns a vector of ones and zeros of length(A), representing those rows of A which are members of B.
You will need to transpose your matrices though.
I do not understand this sentence: "I want to know if the columns are completely unique - even if the same rows are filled in each they could have different values."
ismember(x,y,'rows') searches for equal rows in both matrices. This is exactly, what "if columns in one matrix are repeated elsewhere in the other matrix" means, isn't it?

댓글을 달려면 로그인하십시오.

답변 (1개)

Adam Danz
Adam Danz 2021년 1월 13일
편집: Adam Danz 2021년 1월 13일
To determine if two arrays are 100% identical, use isequal or isequaln to ignore NaN values.
To determine if columns in matrix A are found in matrix B,
% create demo data
rng('default')
A = randi(3,3,20);
B = randi(4,3,20);
for i = 1:size(A,2)
isInB(i) = ismember(A(:,i)', B','rows');
end
isInB is a 1xn logical vector for n columns of A where isInB(i) indicates whether column i of A is found in B.
To find the column numbers in B that match each column number in A
isMatchInB = false(size(A,2),size(B,2));
for i = 1:size(A,2)
isMatchInB(i,:) = arrayfun(@(j)isequaln(A(:,i),B(:,j)),1:size(B,2));
end
isInB = any(isMatchInB,2);
isInB is a 1xn logical vector for n columns of A where isInB(i) indicates whether column i of A is found in B.
isMatchInB is an ixj logical vector where isMatchInB(i,j) indicates where colum i of A matches column j of B.

댓글 수: 3

This appears like it may work on small data sets but i get this error:
Error using false
Requested 100274x100274 (9.4GB) array exceeds maximum array size preference. Creation of
arrays greater than this limit may take a long time and cause MATLAB to become unresponsive.
Error in uniquetest (line 7)
isMatchInB = false(size(A,2),size(B,2));
More information
You could try storing the subscript indicies instead,
c = cell(1,size(A,2));
for i = 1:size(A,2)
c{i} = find(arrayfun(@(j)isequaln(A(:,i),B(:,j)),1:size(B,2)));
end
Or using sparse matrix
isMatchInB = sparse([],[],false(0),size(A,2),size(B,2));

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Matrices and Arrays에 대해 자세히 알아보기

제품

질문:

2021년 1월 13일

편집:

2021년 1월 15일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by