Extracting a wide table from a very tall datastore array, also trying KDB+

Question

Michael 2019년 8월 28일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/477985-extracting-a-wide-table-from-a-very-tall-datastore-array-also-trying-kdb

댓글: Michael 2019년 9월 12일

Hello,

I am currently storing about 40GB of data in a 4 column tall array with about 1B rows. I'm using a Matlab Tall Array Datastore, but I also plan on experimenting with KDB+.

Could someone comment on the best way to extract a wide table, with about 350,000 rows and about 20 columns by selecting some of the items in the first three columns and the corresponding values,e.g DATE = {29-Jul-1983:31-Aug-1983} & STOCK = AAPL US EQUITY & FIELD= {MOV_AVG_50D & PRICE}?

Thank You,

Michael

Input

DATE,STOCK,FIELD,VALUE
29-Jul-1983 00:00:00,BHP AT EQUITY,MOV_AVG_50D,0.8979
31-Aug-1983 00:00:00,BHP AT EQUITY,PRICE,0.9029
29-Jul-1983 00:00:00,IBM US EQUITY,MOV_AVG_50D,0.9106
31-Aug-1983 00:00:00,IBM US EQUITY,PRICE_50D,0.9154
29-Jul-1983 00:00:00,AAPL US EQUITY,MOV_AVG_50D,0.9227
31-Aug-1983 00:00:00,AAPL US EQUITY,PRICE,0.9311

Output:

DATE,MOV_AVG_50D,PRICE
29-Jul-1983 00:00:00,0.9227,NaN
31-Aug-1983 00:00:00,NaN,0.9311

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Michael 2019년 8월 29일

MATLAB Online에서 열기

This is what I tried so far.

ds = datastore('bigtall.csv','DatetimeType','text');
ds.SelectedFormats={'%q','%C','%C','%f'};
tds = tall(ds);tm.cputime(1)=cputime;
tic;
t = tds(tds.TICKER=='UNIT US EQUITY' & tds.FIELD=='EBITDA_MARGIN',:);
T = gather(t);
tm.cputime(2)=cputime;
tm.tictoc(2)= toc;
fprintf('%d minutes CPU, %d minutes\n',[diff(tm.cputime) tm.tictoc(2)]./60);
fprintf('Table size [%dx%d]\n',size(T));
T.DATE = datetime(T.DATE);
tb = unstack(T(:,{'DATE','VALUE','FIELD'}), 'VALUE', 'FIELD');
fprintf('Table size [%dx%d]\n',size(tb));

The output took 23 minutes.

Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 4).Evaluating tall expression using the Parallel Pool 'local':
- Pass 1 of 1: Completed in 20 min 44 sec
Evaluation completed in 21 min 26 sec
5 minutes CPU, 23 minutes
Table size [762x4]
Table size [762x2]

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Michael 2019년 9월 12일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/477985-extracting-a-wide-table-from-a-very-tall-datastore-array-also-trying-kdb#answer_391638

Hello,

Here's an update. Unfortunatly, or fortuntely depending on how you look at it, KDB+ is about 45x times faster (31 seconds vs. 23 minutes) and can be accessed through Matlab's datafeed toolbox.

Best,

Michael

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Michael 2019년 9월 12일

MATLAB Online에서 열기

This took 1 second instead of 23 minutes!

select from f where TICKER=`$("AAPL US EQUITY"), FIELD in (`MOV_AVG_50D,`HIGH), DATE within 1983.07.29 1983.08.31

댓글을 달려면 로그인하십시오.

Extracting a wide table from a very tall datastore array, also trying KDB+

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Extracting a wide table from a very tall datastore array, also trying KDB+

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기