Searching a huge array with a for loop - How to increase speed?

Hi!
I have a humongous datetime array containing upwards of a hundred million entries of the timestamps of financial market quote updates. What I want to do is count the number of updates falling into each hour of trading. I'm doing it as follows which works but is glacially slow.
t = '20130102 00:00:00:000'; %start date
te = '20170731 23:00:00:000'; %end date
kl = datetime(t, 'InputFormat','yyyyMMdd HH:mm:ss:SSS'):hours(1):datetime(te, 'InputFormat','yyyyMMdd HH:mm:ss:SSS'); %creating an array with each time interval that is to be searched
ys = zeros(length(kl)+1,1); %preassigning the array that will contain the solutions
c = 1; %variable counting through search intervals
for k=1:length(kl);
ys(c) = sum(isbetween(Time,kl(c),kl(c)+hours(1))); %counting all entries between a one hour period an a specific day and saving the it in variable ys
c=c+1
end
So essentially I'm making matlab search the huge array 'Time' and count all the entries within all one hour periods.
Is there anyway to make my code more efficient to speed up the process? One solution would be to just make Matlab go through the Database 23 times (for each trading hour) instead of 40000 times but I can't figure out how to do that.

댓글 수: 4

Upload a sample of your data Time
Ben
Ben 2018년 11월 10일
편집: Ben 2018년 11월 10일
20130102 01:00:00:558
20130102 01:00:01:272
20130102 01:00:02:228
20130102 01:00:02:308
20130102 01:00:02:892
20130102 01:00:03:517
20130102 01:00:05:298
20130102 01:00:13:857
20130102 01:00:14:296
20130102 01:00:14:417
Time looks like this. It's high frequency data.
are you sure its working? I'm trying to run your code but facing errors
Hmm yeah definitely working for me. What's the error?

댓글을 달려면 로그인하십시오.

 채택된 답변

Bruno Luong
Bruno Luong 2018년 11월 10일
편집: Bruno Luong 2018년 11월 10일
ts = datenum('20130102 00:00:00:000', 'yyyymmdd HH:MM:SS:FFF');
te = datenum('20170731 23:00:00:000', 'yyyymmdd HH:MM:SS:FFF');
t = datenum(Time, 'yyyymmdd HH:MM:SS:FFF');
ys = histcounts(t,ts:1/24:te);

댓글 수: 4

Ben
Ben 2018년 11월 10일
편집: Ben 2018년 11월 10일
Yep. Two great ideas here. Converting to numbers and then using histcounts. I didn't know that function.
However histcounts(t,ts:1/24:te) didn't work instead histcounts(t,40127) does the job by manually assigning the number of equally spaced bins in the sample (number of hours in the sample).
Thanks a bunch Bruno
The correct format is 'yyyymmdd HH:MM:SS:FFF'
I just took from your code and modify the mili-second field, which is obviously wrong.
If you were using release R2017a or later you wouldn't need to convert to serial date numbers. In R2017a we added support for directly calling histcounts on datetime and duration arrays. As part of that support you could directly say 'hour' as the BinMethod.
histcounts(yourDatetimeArray, 'BinMethod', 'hour')
+1 @Steven informative

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Dates and Time에 대해 자세히 알아보기

제품

릴리스

R2016a

질문:

Ben
2018년 11월 10일

편집:

2018년 11월 10일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by