speed up ismember with two-dim data

조회 수: 12 (최근 30일)
Felix Müller
Felix Müller 2021년 7월 13일
댓글: Felix Müller 2021년 7월 15일
I have two large cell-arrays (X and Y) where each field is a two dimensional array representing pixel positions. The cell arrays are not the same size (can be in the range of thousands) and each entry can be be a different length (ranging from length 1 to length >1000). Maybe relevant: Over X and Y individually, no point can be included twice, i.e. all entries in X have no points in common (same for Y).
Now I want to find the overlap between all entries in X with all entries in Y. (in reality I filter the entries in Y for a criterion and compute only roughly 10-20 percent of the comparisons)
Is there a way to speed up this process? I have read about ismembc which requires sorted data (and thus I thought it cannot be used in my case).
Minimal working example:
X{1} = [1 1; 1 2; 1 3];
X{2} = [3 1; 3 2; 3 3];
Y{1} = [1 1; 1 2];
Y{2} = [2 3; 3 1; 3 2; 3 3];
Y{3} = [5 5];
mat_overlap = NaN(length(X), length(Y));
for ii = 1:length(X)
for jj = 1:length(Y)
mat_overlap(ii,jj) = sum(ismember(X{ii}, Y{jj}, 'rows'));
end
end
  댓글 수: 6
Jan
Jan 2021년 7월 14일
@Felix Müller: I assume, there is a typo in the code to produce the test data:
X{k} = random_vector(idx_first:idx_last, :);
% ^^^ added to get 2 columns
I've posted a comparison with ismembc and get a speedup of a factor 100 - much more then 60%.
Felix Müller
Felix Müller 2021년 7월 15일
True, typo! (I tested with my real data)
Yes, that can well be! I'll make it clearer in my answer above. I tested my whole routine which does other things as well besides only using ismember/ismembc and the 60% speed up was for that. For the raw comparison it will be much more but of course that is the interesting number here!

댓글을 달려면 로그인하십시오.

채택된 답변

Jan
Jan 2021년 7월 14일
function Test_Felix
X = TestData;
Y = TestData;
% Original method:
tic;
mat_overlap = NaN(length(X), length(Y));
for ii = 1:length(X)
for jj = 1:length(Y)
mat_overlap(ii,jj) = sum(ismember(X{ii}, Y{jj}, 'rows'));
end
end
toc
% Modified:
tic;
nX = numel(X); % Condense the 2 columns to 1:
X2 = cell(1, nX);
for iX = 1:nX
X2{iX} = X{iX}(:,1) * 5000 + X{iX}(:, 2);
end
nY = numel(Y);
Y2 = cell(1, nY);
for iY = 1:nY
Y2{iY} = Y{iY}(:,1) * 5000 + Y{iY}(:, 2);
end
mat_overlap2 = NaN(nX, nY);
for iY = 1:nY % Swap loops
Yi = sort(Y2{iY}); % Sort once only
for iX = 1:nX
mat_overlap2(iX, iY) = sum(ismembc(X2{iX}, Yi));
end
end
toc
isequal(mat_overlap, mat_overlap2)
end
function X = TestData
n = 200;
X = cell(1, n);
arr = randi([15, 1500], n, 1);
num_total = sum(arr);
% create vector with unique rows
random_vector = [];
while length(random_vector) < num_total
num_missing = num_total - length(random_vector);
append = round(4000*rand(num_missing, 2));
random_vector = unique([random_vector; append], 'rows');
end
% fill X with the correct different lengths
for k = 1:n
idx_first = sum(arr(1:(k-1))) + 1;
idx_last = sum(arr(1:k));
X{k} = random_vector(idx_first:idx_last, :);
end
end
Timings on Matlab R2018b, i5 mobile:
Elapsed time is 17.209408 seconds.
Elapsed time is 0.175748 seconds.
A factor of 100. Nice! I've reduced the data size from n=1000 to 200 for faster tests.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기

태그

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by