Speed up distance calculation in N-dimensional space
이전 댓글 표시
Hi all,
I have a function I'd like your help to speed up. The function takes two vectors x and Y of points in N-dimensional space as inputs (x and Y have the same size, typically N=500 by m=50,000). For every point in x I want to find the closest (according to squared difference) point in Y and return its number. The inputs are almost random. The function is :
function I = findnearest(x,Y)
I = zeros(size(x,1),1);
o = ones(size(Y,1),1);
for i = 1:size(x,1)
sqdiff = (o*x(i,:) - Y).^2;
xdiff = sum(sqdiff,2);
[~,I(i)] = min(xdiff);
end
end
I've tried to vectorize it but it turns out even slower than the current version. Any ideas on how to improve performance would be greatly appreciated! The function is called upon many times and as such takes immense computational time.
Thanks in advance!
Edit: I just realized that the part of the code that takes 99% of the time I only have N=5 to N=20... I'm terribly sorry for this confusion! The parts where N>=400 are not run that often so they are not that big a deal.
댓글 수: 7
Said BOUREZG
2015년 2월 28일
What's the aim of this function?
Image Analyst
2015년 2월 28일
What is the size of x and y, and what does "immense" mean to you (how many microseconds)?
Anders Österling
2015년 2월 28일
John D'Errico
2015년 2월 28일
In my tests on problems of that size, it appears to take on the order of a minute for each computation.
Anders Österling
2015년 2월 28일
John D'Errico
2015년 2월 28일
An important question is if BOTH sets of points change with each evaluation. If Y stays fixed for example, then one of the tree schemes might be very good, as then the cost of building the tree will be spread out over the many times it will be used.
Anders Österling
2015년 2월 28일
답변 (1개)
Matt J
2015년 2월 28일
0 개 추천
댓글 수: 5
John D'Errico
2015년 2월 28일
편집: John D'Errico
2015년 2월 28일
Oh. I was going to answer this, then I went off to have some breakfast and forgot about it. I'm not sure whether IPDM will be any faster, as the algorithm it uses for a nearest neighbor is not really that great.
I'd wonder about looking for a good k-d tree code instead, which might be able to help here. A problem with a kd-tree is it may not be efficient in a high number of dimensions (500) as is seen here. Ball-trees might also be worth looking into, but I've not seen much of them.
John D'Errico
2015년 2월 28일
Some further thought suggests that the tree search schemes might not be as useful as I might have hoped. These schemes have the flaw that they are inefficient as the number of dimensions rises, forcing the code to test too many points. So in a very high number of dimensions (500 is very high in this context) you won't see much improvement at all. As well, the branching ensures no gain can come from any vectorization at all.
Anders Österling
2015년 2월 28일
@Anders,
Matt - in really impressed by your use of bsxfun in my other question!
Perhaps you could Accept-click the answer then :D ...
As for the rest of your last post, IPDM does use bsxfun, so if John doesn't recommend it, I'm not sure how much to hope for. Arrayfun is not designed for high speed computation, so I wouldn't recommend it for what you are trying to do.
The only thing I can think of at this point is to divide the task with parfor in combination with Worker Object Wrapper ( Download ). The Worker Object Wrapper will allow you to make Y persist on the parallel workers across multiple calls to parfor, so you don't have to rebroadcast it each time you repeat the parfor loop. Since you say Y changes rarely, this will save you some overhead.
Anders Österling
2015년 3월 6일
카테고리
도움말 센터 및 File Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!