Massive time required for pdist

조회 수: 1(최근 30일)
Sebastian Stumpf 2021년 10월 4일
댓글: Sebastian Stumpf 2021년 10월 6일
Hello,
I am using the Matlab function pdist to calculate the distance between two points. However, I noticed that the function needs a lot of time, despite it is using all four cores. I build this example to demonstrate the massive time comsumption. If I calculate the distance between two points with my own code, it is much faster. The example calculates the distance between a thousand points.
clear
close all
clc
tic
j=1;
X = rand(1000,2);
Y = rand(1000,2);
fprintf('Time for array creation: ');
toc
tic
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
A(j,1) =sqrt((Y(i,1)-X(k,1))^2 + (Y(i,2)-X(k,2))^2);
j = j+1;
end
end
fprintf('Time for own distance calculation: ');
toc
j = 1;
tic
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
P = [Y(i,1),Y(i,2);X(k,1),X(k,2)];
B(j,1) = pdist(P,'euclidean');
j = j+1;
end
end
fprintf('Time for distance calculation using Matlab function pdist: ');
toc
Output:
Time for array creation: Elapsed time is 0.000386 seconds.
Time for own distance calculation: Elapsed time is 0.251026 seconds.
Time for distance calculation using Matlab function pdist: Elapsed time is 10.776532 seconds.
You can clearly see, that the Matlab function pdist takes over 10 seconds longer.
My question is: Why? What else is this function doing?
Would be nice to know.
Thank you very much
Kind regards,
Sebastian

댓글을 달려면 로그인하십시오.

채택된 답변

Chunru 2021년 10월 4일
편집: Chunru 2021년 10월 4일
%tic
X = rand(1000,2);
Y = rand(1000,2);
% fprintf('Time for array creation: ');
%toc
%% Version 1
tic
j=1;
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
A(j,1) =sqrt((Y(i,1)-X(k,1))^2 + (Y(i,2)-X(k,2))^2);
j = j+1;
end
end
size(A)
ans = 1×2
1000000 1
t = toc;
fprintf('Time for own distance calculation: %.6f\n', t);
Time for own distance calculation: 0.307268
%% Version 1.1
% Pre-allocate A
tic
j=1;
A = inf(size(X,1)*size(Y,1), 1);
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
A(j,1) =sqrt((Y(i,1)-X(k,1))^2 + (Y(i,2)-X(k,2))^2);
j = j+1;
end
end
size(A)
ans = 1×2
1000000 1
t = toc;
fprintf('Time for own distance calculation with preallocation: %.6f\n', t);
Time for own distance calculation with preallocation: 0.112437
%% Version 2
tic
j=1;
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
P = [Y(i,1),Y(i,2);X(k,1),X(k,2)];
B(j,1) = pdist(P,'euclidean'); % one pair
j = j+1;
end
end
size(B)
ans = 1×2
1000000 1
t = toc;
fprintf('Time for distance calculation using Matlab function pdist: %.6f\n', t);
Time for distance calculation using Matlab function pdist: 15.181589
%% Version 2.1
% Pre-allocate B before hand
tic
j=1;
B = inf(size(X,1)*size(Y,1), 1);
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
P = [Y(i,1),Y(i,2);X(k,1),X(k,2)];
B(j,1) = pdist(P,'euclidean');
j = j+1;
end
end
size(B)
ans = 1×2
1000000 1
t = toc;
fprintf('Time for distance calculation using Matlab function pdist: %.6f\n', t);
Time for distance calculation using Matlab function pdist: 12.980660
%% Version 3
% pdist of many points (this compute distance x2-x1, x3-x1, ... x1000-x1,
% y1-x1, ..., y10001; x3-x2, ..., x1000-x2, ..., y1000-x2 etc
% doc pdist
tic
p = pdist([X; Y]); % dist
size(p)
ans = 1×2
1 1999000
t = toc;
fprintf('Time for distance calculation using Matlab function pdist (many points): %.6f\n', t);
Time for distance calculation using Matlab function pdist (many points): 0.016222
댓글 수: 1표시숨기기 없음
Sebastian Stumpf 2021년 10월 6일
Thank you for your detailed answer. It looks like I didn't use the function very efficently.
Kind regards

댓글을 달려면 로그인하십시오.

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by