Efficient method for finding index of closest value in very large array for a very large amount of examples
이전 댓글 표시
I have two very large one dimensional arrays, 'aRef' which is around 11,000,000 elements and 'aTest' which is around 10,000,000 elements. I need to find the index of the closest element in 'aRef' for all elements in 'aTest'. 'aRef' is sorted and 'aTest' can be sorted if that will help performance.
- Method 1: Returns at out of memory error as the arrays are far too large
diff = abs(bsxfun(@minus,aRef,aTest'));
[~, I] = min(diff);
- Method 2: Takes around 0.03 seconds per iteration (but varies greatly) and therefore around 300000 seconds in total
for k = 1:n
diff = abs(aRef- aTest(k));
[~, I(k)] = min(diff);
end
- Method 3: Takes around 0.013 seconds per iteration and therefore 130000 seconds in total
for k = 1:n
i_lower = find(aRef <= aTest(k),1,'last');
i_higher = find(aRef >= aTest(k),1,'first');
end
Is there a more efficient method for this that won't exhaust the memory or take so long to run?
Thanks for your help.
댓글 수: 1
채택된 답변
추가 답변 (3개)
Damon Landau
2018년 8월 8일
I = interp1(aRef, 1:numel(aRef), aTest, 'nearest', 'extrap');
should be faster and (arguably) more straightforward than "discretize"
James Tursa
2015년 5월 22일
편집: James Tursa
2015년 5월 22일
This is a fairly simple mex routine if you start with both arrays sorted, since you don't need to make comparisons between all of the elements to get the answer. A simple stepping process can get the job done. E.g., the timing on the sizes you have listed:
>> aRef = sort(rand(11000000,1));
>> aTest = sort(rand(10000000,1));
>> tic; aClosest = closest(aRef,aTest); toc
Elapsed time is 0.130508 seconds.
A shorter example to see that the results are as expected:
>> aRef = 1:10
aRef =
1 2 3 4 5 6 7 8 9 10
>> aTest = sort(rand(1,20)*20-5)
aTest =
Columns 1 through 15
-4.4987 -2.2252 -2.0507 -1.2286 -1.0066 -0.9144 -0.2292 -0.0079 4.0035 4.6724 5.1178 5.3893 5.6688 7.4530 10.6621
Columns 16 through 20
10.9377 12.8271 12.9305 13.5336 14.6298
>> aClosest = closest(aRef,aTest)
aClosest =
1 1 1 1 1 1 1 1 4 5 5 5 6 7 10 10 10 10 10 10
The mex routine is below. (Note the caveat about not handling Inf's and NaN's consistently, although that code could be added if needed)
// closest.c
// C = closest(A,B)
// Inputs: A = full real double vector (reference), sorted ascending
// B = full real double vector (test), sorted ascending
// Output: C = index of closest value in A to value in B
// i.e., A(C(i)) is the closest value in A to B(i)
// Ties are resolved in favor of higher index
// Is not currently coded to handle Inf's and NaN's consistently
// C is the same size as B
// Programmer: James Tursa
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
size_t i, j, k, m, n;
double *A, *B, *C;
//
// Check input arguments
if( nrhs != 2 ||
!mxIsDouble(prhs[0]) || !mxIsDouble(prhs[1]) ||
mxIsComplex(prhs[0]) || mxIsComplex(prhs[1]) ||
mxIsSparse(prhs[0]) || mxIsSparse(prhs[1]) ) {
mexErrMsgTxt("Need exactly two full double vectors as input");
}
// Check number of output arguments
if( nlhs > 1 ) {
mexErrMsgTxt("Too many outputs");
}
// Get the number of elements involved
m = mxGetNumberOfElements(prhs[0]);
n = mxGetNumberOfElements(prhs[1]);
// Disallow empty reference vector
if( m == 0 ) {
mexErrMsgTxt("Reference vector (1st input) cannot be empty");
}
// Create uninitialized output
if( mxGetM(prhs[1]) == 1 ) {
plhs[0] = mxCreateUninitNumericMatrix( 1, n, mxDOUBLE_CLASS, mxREAL );
} else {
plhs[0] = mxCreateUninitNumericMatrix( n, 1, mxDOUBLE_CLASS, mxREAL );
}
// If B is empty, simply return empty result
if( n == 0 ) {
return;
}
// Get the data pointers
A = mxGetPr(prhs[0]);
B = mxGetPr(prhs[1]);
C = mxGetPr(plhs[0]);
k = 0;
// Assign 1st index to all values B that are less than 1st A value
while( k < n && B[k] < *A ) {
C[k++] = 1.0;
}
// Step through until B is between two A values, then test for result
i = 0;
for( j=k; j<n; j++ ) {
while( i+1 < m && B[j] >= A[i+1] ) i++;
if( i+1 == m ) break;
if( B[j] - A[i] < A[i+1] - B[j] ) {
C[j] = i + 1;
} else {
C[j] = i + 2;
}
}
// Assign last index to all values B that are more than last A value
while( j < n ) {
C[j++] = m;
}
}
Jos (10584)
2015년 5월 29일
0 개 추천
I point you to my NEARESTPOINT function, available on the File Exchange:
카테고리
도움말 센터 및 File Exchange에서 Logical에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!