Editing a Tall table and writing it into a csv file

조회 수: 2 (최근 30일)
Rachel Leber
Rachel Leber 2020년 3월 17일
답변: Tom W 2021년 1월 7일
Hi,
I have a really large csv file (about 6 millions rows and 30 columns). I want to edit specific columns of this file and save the changes.
I tried creating a tall table from a datastore, extracting and manipulating the relevant columns, and then assigning them into the table. However, when i attemp to write the new tall table, I get the following error:
Error using tall/subsasgn (line 29)
Incompatible tall array arguments. The first dimension in each tall array
must have the same size, or have a size of 1.
that's even though I had no problems editing the table before attempting to write it.
relevant code:
%get csv file
[file,path] = uigetfile('*.csv');
source = [path,file];
%% create tall table
ds = datastore(source);
ds.TextscanFormats{1} = '%s';
ds.Delimiter = ',';
tTable = tall(ds);
%retrieve relevant column data
colX = gather(tTable.colX);
Flag = gather(tTable.Flag);
combinedFlag = colX2flag(Flag,colX); %this is a function that manipulates the data
combinedFlag = tall(combinedFlag);
%%
% put data back into table
tTable.colX(:) = combinedFlag;
tTable.colY(:) = combinedFlag;
tTable.colZ(:) = combinedFlag;
%%
write('C:\Users\.......\test_*.csv',tTable); %obviously no ..... in the actual code
In addition, if I try to write tTable without any manipulation, it splits the result into many csv files. is there a way to save all the data into just one file?
  댓글 수: 4
Guillaume
Guillaume 2020년 3월 17일
Right, so the problem is actually from the line:
tTable.colX(:) = combinedFlag;
matlab only goes through the actual assignment once you call write, hence why you don't receive an error on the actual line, but that's where the problem is. It seems that your combinedFlag doesn't have the same number of rows as the original array, which indeed is a problem.
Rachel Leber
Rachel Leber 2020년 3월 17일
Thank you for the quick responses.
It is actually the exact same size. I was able to solve this by gathering the entire table, editing it as a regular table and then calling tall and write. Regarding writing the entire table into one csv file, i'm rather unfamilliar with tall tables and parallel processing - would you kindly elaborate on your suggestion to ignore filenames?

댓글을 달려면 로그인하십시오.

답변 (2개)

Guillaume
Guillaume 2020년 3월 17일
"It is actually the exact same size"
If I recall correctly, you do indeed get some misleading error messages when you try to combine different tall arrays from different datastores, which is the case here (your combinedflag tall array is completely disconnected from the original tall array since you've been through a gather). My understanding is that combining tall arrays like that is not supported.
To fix the problem, you would have to get rid of the gather and modify your colX2flag function so that it can operates directly on tall arrays.
However, since you have enough memory to gather the entire table, there's no point in using tall arrays. You can just use regular tables which would solve all your problems:
%get csv file
[file,path] = uigetfile('*.csv');
source = fullfile(path, file); %prefer fullfile to concatenation
tTable = readtable(source, 'Delimiter', ',');
combinedFlag = colX2flag(tTable.Flag, tTable.colX);
tTable.colX(:) = combinedFlag;
tTable.colY(:) = combinedFlag;
tTable.colZ(:) = combinedFlag;
writetable(tTable, 'C:\somewhere\test.csv');

Tom W
Tom W 2021년 1월 7일
Did you figure it out?

카테고리

Help CenterFile Exchange에서 Tall Arrays에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by