Improving Efficiency of Find Algorithm

조회 수: 11 (최근 30일)
Matt C
Matt C 2021년 8월 19일
답변: Cris LaPierre 2021년 8월 19일
Hello,
I am aware that logical indexing is much faster than the usage of the find function, in specific instances. I'm wondering if there is a way to improve the following algorithm - I'm not quite sure how to use indexing (if possible) in this situation.
What I have is a matrix of ascending values, though some of those values may be repeated (specifically, I have millions of ascending timestamps with many repeated). I am then seeking the start and end indices of a window that is between time X and Y.
Here is an example of the algorithm that I currently have implemented:
myDataTimestamps = [10 20 30 30 30 40 50 60 60 60 70 70 80 90];
window_start_time = 30;
window_end_time = 80;
start_index = find(myDataTimestamps >= window_start_time,1,'first');
end_index = find(myDataTimestamps <= window_end_time,1,'last');
Is there a way to improve the speed of this code and still return the same start_index of 3 and end_index of 13?
Much appreciated!

답변 (2개)

Cris LaPierre
Cris LaPierre 2021년 8월 19일
This approach may only work for this simple case, but here's a way to do it using max/min.
myDataTimestamps = [10 20 30 30 30 40 50 60 60 60 70 70 80 90];
window_start_time = 30;
window_end_time = 60;
% find start/end index
ind = 1:length(myDataTimestamps);
wind = myDataTimestamps==window_start_time | myDataTimestamps==window_end_time;
start_index = min(ind(wind))
start_index = 3
end_index = max(ind(wind))
end_index = 10
  댓글 수: 1
Matt C
Matt C 2021년 8월 19일
편집: Matt C 2021년 8월 19일
I haven't been able to do a comparison, but that wasn't the massive improvement that I was hoping for. I had cancelled the script early without grabbing a total runtimes for comparison, but it looked like the proposed algorithm was going to take just as long (if not longer) than the find function. I implemented the recommendation as:
myDataTimestamps = [10 20 30 30 30 40 50 60 60 60 70 70 80 90];
window_start_time = 30;
window_end_time = 60;
% find start/end index
ind = 1:length(myDataTimestamps);
start_index = min(ind(myDataTimestamps>=window_start_time));
end_index = max(ind(myDataTimestamps<=window_end_time));
Have I blown anything in my above implementation? Note that my processor loading was ~50%, and only ~1 GB of my 24 GB of RAM was being used.
Edit: I can confirm that it took much longer using the min/max method. My code took ~45 minutes to fully execute using 'find', whereas it had only completed ~25% after about 2 hours using the min/max method.

댓글을 달려면 로그인하십시오.


Cris LaPierre
Cris LaPierre 2021년 8월 19일
I wonder if this is a scenario where using tall data may help. See this page.

카테고리

Help CenterFile Exchange에서 Logical에 대해 자세히 알아보기

태그

제품


릴리스

R2015b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by