Alternative to outerjoin for large table concatenation?
조회 수: 4 (최근 30일)
이전 댓글 표시
Hi MATLAB'ers,
I am looking for tips on how to speed up vertical concatenation of tables. Specifically, how to identify and append missing columns to each table before concatenation.
In my case, I have 2000 tables stored in a cell array and ~15000 unique columns (VariableNames). Each table has a randomly-ordered subset of between 10-3000 of these columns and, importantly, a column containing unique sample identifiers.
One way of tackling this is to simply call outerjoin in a loop. Outerjoin normalizes the column headers for you, but spends an inordinate amount of time in the joinInnerOuter.m and defaultarrayLike.m subfunctions.
MergeTable = AllTablesCellArray{1};
for nt = 2:length(MutTables)
MergeTable = outerjoin(MergeTable,AllTablesCellArray{nt},'MergeKeys',true);
end
Another strategy is to pad the individual tables with the extra (missing) columns then vertcat. This works, but is slower. On the other hand, this code can use a parfor loop.
AllVars = cellfun(@(x) x.Properties.VariableNames,AllTablesCellArray,'UniformOutput',false);
UniqueVars= unique([AllVars{:}],'stable');
for nt = 1:length(AllTablesCellArray)
MissingVars = UniqueVars(~ismember(UniqueVars,AllVars{nt}))';
if ~isempty(MissingVars)
AllTablesCellArray{nt}{:,MissingVars} = repmat({''},height(AllTablesCellArray{nt}),length(MissingVars));
end
end
MergeTable = vertcat(AllTablesCellArray{:});
I am hoping that my brain is just fried and that I am missing something obvious. My hope is to avoid converting each table to a cell array, but this could be a good way to go.
Thanks!
-Jeff
댓글 수: 0
답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Data Type Identification에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!