Editing a Tall table and writing it into a csv file
조회 수: 2 (최근 30일)
이전 댓글 표시
Hi,
I have a really large csv file (about 6 millions rows and 30 columns). I want to edit specific columns of this file and save the changes.
I tried creating a tall table from a datastore, extracting and manipulating the relevant columns, and then assigning them into the table. However, when i attemp to write the new tall table, I get the following error:
Error using tall/subsasgn (line 29)
Incompatible tall array arguments. The first dimension in each tall array
must have the same size, or have a size of 1.
that's even though I had no problems editing the table before attempting to write it.
relevant code:
%get csv file
[file,path] = uigetfile('*.csv');
source = [path,file];
%% create tall table
ds = datastore(source);
ds.TextscanFormats{1} = '%s';
ds.Delimiter = ',';
tTable = tall(ds);
%retrieve relevant column data
colX = gather(tTable.colX);
Flag = gather(tTable.Flag);
combinedFlag = colX2flag(Flag,colX); %this is a function that manipulates the data
combinedFlag = tall(combinedFlag);
%%
% put data back into table
tTable.colX(:) = combinedFlag;
tTable.colY(:) = combinedFlag;
tTable.colZ(:) = combinedFlag;
%%
write('C:\Users\.......\test_*.csv',tTable); %obviously no ..... in the actual code
In addition, if I try to write tTable without any manipulation, it splits the result into many csv files. is there a way to save all the data into just one file?
댓글 수: 4
Guillaume
2020년 3월 17일
Right, so the problem is actually from the line:
tTable.colX(:) = combinedFlag;
matlab only goes through the actual assignment once you call write, hence why you don't receive an error on the actual line, but that's where the problem is. It seems that your combinedFlag doesn't have the same number of rows as the original array, which indeed is a problem.
답변 (2개)
Guillaume
2020년 3월 17일
"It is actually the exact same size"
If I recall correctly, you do indeed get some misleading error messages when you try to combine different tall arrays from different datastores, which is the case here (your combinedflag tall array is completely disconnected from the original tall array since you've been through a gather). My understanding is that combining tall arrays like that is not supported.
To fix the problem, you would have to get rid of the gather and modify your colX2flag function so that it can operates directly on tall arrays.
However, since you have enough memory to gather the entire table, there's no point in using tall arrays. You can just use regular tables which would solve all your problems:
%get csv file
[file,path] = uigetfile('*.csv');
source = fullfile(path, file); %prefer fullfile to concatenation
tTable = readtable(source, 'Delimiter', ',');
combinedFlag = colX2flag(tTable.Flag, tTable.colX);
tTable.colX(:) = combinedFlag;
tTable.colY(:) = combinedFlag;
tTable.colZ(:) = combinedFlag;
writetable(tTable, 'C:\somewhere\test.csv');
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Tall Arrays에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!