Building tall table from tall arrays generates error
조회 수: 3 (최근 30일)
이전 댓글 표시
clear
dataFile = 'data.csv';
ds = tabularTextDatastore(dataFile, FileExtensions='.csv');
ds.ReadVariableNames = true;
ds.Delimiter = ',';
ds.SelectedVariableNames = ["hash", "count"];
ds.SelectedFormats = {'%s', '%f'};
data = tall(ds);
[g, THash] = findgroups(data.hash);
TCount = splitapply(@(x) {x}, data.count, g);
%% This works but cannot use it because actual data file is far larger than memory
hash = gather(THash);
count = gather(TCount);
T1 = table(hash, count);
%% This is the intended code but doesn't work
TT = table(THash,TCount);
write(fullfile(pwd,'data'),TT,FileType="parquet");
댓글 수: 0
답변 (1개)
Oguz Kaan Hancioglu
2023년 3월 15일
Your code wasn't work because "gather(TCount)" returns cell array for each element. Therefore you are trying to write double array in to one single cell. You can find the length of each array into the cell. I hope this solves your problem.
%% This works but cannot use it because actual data file is far larger than memory
hash = gather(THash);
count = gather(TCount);
cellsz = cellfun(@size,count,'uni',false);
newCount = cellfun(@(x) x(1),cellsz,'UniformOutput',false)
T1 = table(hash, newCount);
참고 항목
카테고리
Help Center 및 File Exchange에서 Analysis of Big Data with Tall Arrays에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!