Speed up distance calculation in N-dimensional space

Question

0 개 추천

Hi all,

I have a function I'd like your help to speed up. The function takes two vectors x and Y of points in N-dimensional space as inputs (x and Y have the same size, typically N=500 by m=50,000). For every point in x I want to find the closest (according to squared difference) point in Y and return its number. The inputs are almost random. The function is :

function I = findnearest(x,Y)
I = zeros(size(x,1),1);
o = ones(size(Y,1),1);
for i = 1:size(x,1)
sqdiff = (o*x(i,:) - Y).^2;
xdiff = sum(sqdiff,2);
[~,I(i)] = min(xdiff);
end
end

I've tried to vectorize it but it turns out even slower than the current version. Any ideas on how to improve performance would be greatly appreciated! The function is called upon many times and as such takes immense computational time.

Thanks in advance!

Edit: I just realized that the part of the code that takes 99% of the time I only have N=5 to N=20... I'm terribly sorry for this confusion! The parts where N>=400 are not run that often so they are not that big a deal.

댓글 수: 7
이전 댓글 5개 표시 이전 댓글 5개 숨기기

John D'Errico 2015년 2월 28일

An important question is if BOTH sets of points change with each evaluation. If Y stays fixed for example, then one of the tree schemes might be very good, as then the cost of building the tree will be spread out over the many times it will be used.

Anders Österling 2015년 2월 28일

Hi again! The Y will sometimes be constant 1,000's of times, but will change every now and then. So then the trees might be good after all.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Matt J 2015년 2월 28일

0 개 추천

See IPDM ( Download ) and maybe also the files it acknowledges and the files it inspired.

댓글 수: 5
이전 댓글 3개 표시 이전 댓글 3개 숨기기

Matt J 2015년 2월 28일

편집: Matt J 2015년 2월 28일

@Anders,

Matt - in really impressed by your use of bsxfun in my other question!

Perhaps you could Accept-click the answer then :D ...

As for the rest of your last post, IPDM does use bsxfun, so if John doesn't recommend it, I'm not sure how much to hope for. Arrayfun is not designed for high speed computation, so I wouldn't recommend it for what you are trying to do.

The only thing I can think of at this point is to divide the task with parfor in combination with Worker Object Wrapper ( Download ). The Worker Object Wrapper will allow you to make Y persist on the parallel workers across multiple calls to parfor, so you don't have to rebroadcast it each time you repeat the parfor loop. Since you say Y changes rarely, this will save you some overhead.

Anders Österling 2015년 3월 6일

MATLAB Online에서 열기

Hi again,

I'm sorry for a late reply - something else came up that I needed to deal with before this. Thank you soo much John and Matt! Its getting better, but I'm not yet there.

I've tried a bunch of different things but still haven't found anything that is quite as fast as I'd like it to be. My homemade findnearest is an obviously bad first attempt. I also tried John's ipdm, as well as the built in functions createns/knnsearch combo. Please see below for computational comparisons. I'll keep looking and will report back if I find anything faster than the createns/knnsearch. If anyone has additional ideas on how to improve this, feel free to comment below. [Approximate nearest neighbour might be good enough for me, so I might try to compile FLANN (Link)]. The 2.5seconds might not seem that bad at first, but they are very unwelcome when repeated hundreds of thousands of times!

disp('50000 by 1')
A=rand(50000,1); B = rand(50000,1);
tic; 
disp('Built in: createns/knnsearch')
disp(' -- Grow tree --')
kdtree=createns(B,'NSMethod','kdtree','Distance','Euclidean'); 
toc
tic
disp(' -- Search --')
res1=knnsearch(kdtree,A)';
toc
tic;
disp('Johns IPDM')
res2=ipdm(A,B,'Subset','nearest','result','struct')';
toc
tic; 
disp('My findnearest')
res3=findnearest(A,B); 
toc
disp(' ')
disp('50000 by 8')
A=rand(50000,8); B = rand(50000,8);
tic;
disp('Built in: createns/knnsearch')
disp(' -- Grow tree --')
kdtree=createns(B,'NSMethod','kdtree','Distance','Euclidean'); 
toc
tic
disp(' -- Search --')
res1=knnsearch(kdtree,A)';
toc
tic;
disp('Johns IPDM')
res2=ipdm(A,B,'Subset','nearest','result','struct')';
toc
tic; 
disp('My findnearest')
res3=findnearest(A,B); 
toc

Results

50000 by 1

Built in: createns/knnsearch

-- Grow tree --

Elapsed time is 0.069353 seconds.

-- Search --

Elapsed time is 0.055465 seconds.

Johns IPDM

Elapsed time is 0.029600 seconds.

My findnearest

Elapsed time is 12.559964 seconds.

50000 by 8

Built in: createns/knnsearch

-- Grow tree --

Elapsed time is 0.128941 seconds.

-- Search --

Elapsed time is 2.504368 seconds.

Johns IPDM

Elapsed time is 175.717344 seconds.

My findnearest

Elapsed time is 295.582135 seconds.|

댓글을 달려면 로그인하십시오.

Speed up distance calculation in N-dimensional space

댓글 수: 7
이전 댓글 5개 표시 이전 댓글 5개 숨기기

답변 (1개)

댓글 수: 5
이전 댓글 3개 표시 이전 댓글 3개 숨기기

카테고리

태그

Community Treasure Hunt

Speed up distance calculation in N-dimensional space

댓글 수: 7 이전 댓글 5개 표시 이전 댓글 5개 숨기기

답변 (1개)

댓글 수: 5 이전 댓글 3개 표시 이전 댓글 3개 숨기기

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 7
이전 댓글 5개 표시 이전 댓글 5개 숨기기

댓글 수: 5
이전 댓글 3개 표시 이전 댓글 3개 숨기기