How to speed up table merge / concatenation?

Question

Andrea 2023년 2월 13일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1911550-how-to-speed-up-table-merge-concatenation

답변: Lei Hou 2023년 3월 26일

Hello everybody. I have three very large table which I would need to merge. The tables differ in size and variables, however I have a criterion for merging the three of them. Merge can go through concatenation, such as

ceiling = numel(table_1(:,1))
for i=1:ceiling
    
    idx = table_1.ID(i);
    idx_size = numel(idx);
    table_2_idx = find(table_2.ID == idx);
    table_3_idx = find(table_3.ID == idx);
    
    block_table_2 = repelem(table_2(table_2_idx,:), idx_size,1);
    block_table_3 = repelem(table_3(table_3_idx,:), idx_size,1);
    
    table_main = [table_main; table_1(i,:) block_table_2 block_table_3];
        
end

Indeed, the code above becomes progressively slow as main_table expands. Alternatively, I can do with preallocation, creating a big empty table_main and then allocate each row. However, it doesn't seem to me that I get such a large performance increment. I doubt the function join would give me much advantage as given the nature of IDs in table_1, table_2 and table_3 (the same ID can appear across multiple lines) I would be anyway forced to run the script for each line of table_1.

Any idea would be extremely welcomed as, for a table of size 130,000x14 it takes me half a day to do the merge!

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

dpb 2023년 2월 13일

Most inefficient, yes...

W/O an example dataset its hard (as in impossible) to fully grasp the nuances, but it appears the above is doing the same merge for every possible ID over and over -- unless the ID in table one is unique for each row.

Attach a .mat file with a small(ish) representative example of each of the tables -- 20-30 lines is plenty as long as it is representative of the overall content of the files...

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Lei Hou 2023년 3월 26일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1911550-how-to-speed-up-table-merge-concatenation#answer_1200989

MATLAB Online에서 열기

Hi Andrea,

I'm not sure whether ID in each table is unique (no duplicated value) and whether table_2 and table_3 contain all IDs in table_1. I tried the code you provided. Your code works only when table_2 and table_3 contain one row for each ID in table_1. Based on such assumption, you can use join to merge your tables.

table_1 = table([1;2;3;4],(1:4)',(11:14)','VariableNames',{'ID','t1_Var1','t1_Var2'});
table_2 = table([1;2;3;4;5;6],(21:26)',"s"+(21:26)', rand(6,1),'VariableNames',{'ID','t2_Var1','t2_Var2','t2_Var3'});
table_3 = table([1;3;2;5;4],{'a';'b';'c';'d';'e'},'VariableNames',{'ID','t3_Var1'});
tic;
ceiling = numel(table_1(:,1));
table_main = table;
for i=1:ceiling
    
    idx = table_1.ID(i);
    idx_size = numel(idx);
    table_2_idx = find(table_2.ID == idx);
    table_3_idx = find(table_3.ID == idx);
    
    block_table_2 = repelem(table_2(table_2_idx,2:end), idx_size,1);
    block_table_3 = repelem(table_3(table_3_idx,2:end), idx_size,1);
    
    table_main = [table_main; [table_1(i,:) block_table_2 block_table_3]];
        
end
toc; % On Windows, Elapsed time is 0.007766 seconds.
tic;
join(join(table_1,table_2),table_3);  % Need to call join twice because join only accepts two input tables.
toc; % On Windows, Elapsed time is 0.001880 seconds.

I checked the performance of join and your workflow. Using join is about 4x faster than your workflow. Please try join and see whether it provides good performance for you.

By the way, your code errors if table_1, table_2 and table_3 all contain ID variable.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

How to speed up table merge / concatenation?

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

How to speed up table merge / concatenation?

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기