finding the closest value

Question

0 개 추천

I am working with a very large data set, in which I need to pull numbers from 1 data set that corresponds with a different set. But, if no number from the first data table match the second, then it pulls the next highest number from the second data set.

댓글 수: 2
없음 표시 없음 숨기기

David Hill 2022년 9월 8일

Do all numbers in the first data set need a corresponding number from the other data set (either a match or the next largest)?

jason 2022년 9월 8일

Yes all the numbers must have a corresponding number

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Walter Roberson 2022년 9월 8일

MATLAB Online에서 열기

0 개 추천

Sort each of the data sets. Then proceed iteratively matching points, first from one set and then the other. For A(1) find the first entry B(K) that is >= A(1) . Take that B(K) and scan through A values, A(2), A(3) and so on until you find A(J) > B(K) -- each one found during the search gets matched to B(K) . Then switch and search forward from A(J) looking for the next B entry to match; anything in-between in B is to be ignored.

The algorithm should not be difficult.

There is a vectorized way to proceed, but it would require memory proportional to numel(A) by numel(B) and you indicated that you have a "very large" data set, so it seems unlikely you would be wanting to use that technique.

There is another approach using interp1() with 'next', would probably look something similar to

sB = sort(B);
interp1(sB, sB, A, 'next')

댓글 수: 2
없음 표시 없음 숨기기

jason 2022년 9월 8일

How would you create it the vectorized way?

Walter Roberson 2022년 9월 8일

MATLAB Online에서 열기

D = A(:).' - B(:);
D(D < 0) = inf;

Now you take min(D) along the first dimension, getting out the indices. You would then use indices to index B to find the actual value.

If your A were 30000 entries and your B were 5000 entries then this would require 12 gigabytes for D.

Vectorized does not always mean "most efficient": the above code compares every entry of A to every entry of B and has to scan all of the results to find the closest, which would take O(m*n) time. Whereas sorting A and B and proceeding incrementally like I describe would be O(m*log(m)) or O(n*log(n)) whichever is larger.

댓글을 달려면 로그인하십시오.

Answer 2

David Hill 2022년 9월 8일

MATLAB Online에서 열기

0 개 추천

data3=data1;
S=sort(data2);
idx= find(~ismember(data1,data2));
for k=idx
   f=find(S>data1(k),1);
   data3(k)=S(f);
end

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

finding the closest value

댓글 수: 2
없음 표시 없음 숨기기

채택된 답변

댓글 수: 2
없음 표시 없음 숨기기

추가 답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

제품

릴리스

태그

Community Treasure Hunt

finding the closest value

댓글 수: 2 없음 표시 없음 숨기기

채택된 답변

댓글 수: 2 없음 표시 없음 숨기기

추가 답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 2
없음 표시 없음 숨기기

댓글 수: 2
없음 표시 없음 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기