Fast way to perform multiple searches on a large array
이전 댓글 표시
I have a large time series array (10,000,000 elements) :
ts = [2; 1; 3; 4; 6; 7; .......]
I have a corresponding time array (same size as the above) :
times = [d1; d2; d3; d4; d5.......]
I have 2 arrays of start times and end times (also large ~ 30000 elements):
st = [dd1 dd2 dd3 ....]
en = [de1 de2 de3 ....]
I need to create a new matrix with many many finds. Logic is :
results = NaN(300, numel(st));
for i=1:numel(st);
temp = ts(find(times > st(i) & times < en(i) , 300,'first');
results(:,i) = temp;
end;
Is there any ay I do this faster (ideally without a loop) ?
- I have a 64 bit version so I can try a large in-memory solution.
Many thanks in advance, Nigel
댓글 수: 8
Jan
2011년 10월 4일
I assume "time" should be "times" inside the loop.
Nigel
2011년 10월 4일
Andrei Bobrov
2011년 10월 4일
What size of 'st'?
Nigel
2011년 10월 4일
Jan
2011년 10월 4일
Do the intervals [st(i):en(i)] overlap?
Nigel
2011년 10월 4일
Daniel Shub
2011년 10월 4일
Just to confirm times, st and en are all sorted?
Nigel
2011년 10월 4일
채택된 답변
추가 답변 (2개)
Jan
2011년 10월 4일
Never let an array grow in each iteration! Pre-allocate the output:
results = NaN(300, numel(st));
for i = 1:numel(st) % Not size(st), which is a vector!
temp = ts(find(times > st(i) & times < en(i), 300, 'first');
if length(temp) == 300
results(:, i) = temp;
else
results(1:length(temp), i) = temp;
end
end
results = results(~isnan(results));
If st and times are sorted, it wastes a lot of time to compare all values. But for vectorizing this, a very large matrix would be needed, such that I assume it will be slower than the loop.
Can you solve the problem by using HISTC?
댓글 수: 6
Nigel
2011년 10월 4일
Jan
2011년 10월 4일
Well, then this answer wasted your and my time. Please include the pre-allocation and all other relevant details, if you post a code-snippet and ask for a speed improvement. Otherwise the trials to optimize your code start from a very unrealistic point.
Nigel
2011년 10월 4일
Teja Muppirala
2011년 10월 4일
Since you say you know it will always have 300 entries, then this:
find(times > st(i) & times < en(i), 300, 'first');
May as well be this:
find(times > st(i), 300, 'first');
Daniel Shub
2011년 10월 4일
and since times and st are sorted
0:299+find(times > st(i), 1, 'first')
Nigel
2011년 10월 4일
Nigel
2011년 10월 4일
0 개 추천
댓글 수: 2
Bjorn Gustavsson
2011년 10월 10일
Well then at least do the consequtive 'find's on shortened sections of times (with 'offset' as in Daniel's example):
idx = find(times(offset:end) > st(i), 1,'first');
Then you'd get the benefit from increasingly shorter arrays to search over but without loosing the data.
Daniel Shub
2011년 10월 10일
I wonder if this would be faster. I would hope MATLAB is smart enough not to have to reallocate memory for my method. Yours is probably a little safer. I was also thinking that working from the end backwards might ultimately be the fastest.
카테고리
도움말 센터 및 File Exchange에서 Operators and Elementary Operations에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!