Alternative to outerjoin for large table concatenation?

조회 수: 4 (최근 30일)
Jeff
Jeff 2014년 3월 26일
Hi MATLAB'ers,
I am looking for tips on how to speed up vertical concatenation of tables. Specifically, how to identify and append missing columns to each table before concatenation.
In my case, I have 2000 tables stored in a cell array and ~15000 unique columns (VariableNames). Each table has a randomly-ordered subset of between 10-3000 of these columns and, importantly, a column containing unique sample identifiers.
One way of tackling this is to simply call outerjoin in a loop. Outerjoin normalizes the column headers for you, but spends an inordinate amount of time in the joinInnerOuter.m and defaultarrayLike.m subfunctions.
MergeTable = AllTablesCellArray{1};
for nt = 2:length(MutTables)
MergeTable = outerjoin(MergeTable,AllTablesCellArray{nt},'MergeKeys',true);
end
Another strategy is to pad the individual tables with the extra (missing) columns then vertcat. This works, but is slower. On the other hand, this code can use a parfor loop.
AllVars = cellfun(@(x) x.Properties.VariableNames,AllTablesCellArray,'UniformOutput',false);
UniqueVars= unique([AllVars{:}],'stable');
for nt = 1:length(AllTablesCellArray)
MissingVars = UniqueVars(~ismember(UniqueVars,AllVars{nt}))';
if ~isempty(MissingVars)
AllTablesCellArray{nt}{:,MissingVars} = repmat({''},height(AllTablesCellArray{nt}),length(MissingVars));
end
end
MergeTable = vertcat(AllTablesCellArray{:});
I am hoping that my brain is just fried and that I am missing something obvious. My hope is to avoid converting each table to a cell array, but this could be a good way to go.
Thanks!
-Jeff

답변 (0개)

카테고리

Help CenterFile Exchange에서 Data Type Identification에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by