parcing comma delimited column to multiple vectors and cell arrays

조회 수: 4 (최근 30일)
joseph Frank
joseph Frank 2012년 7월 7일
Hi,
I am importing a series of CSV files of 18 columns each with different row sizes (can be up to 800,000 rows) using teh following codes
for i=1:135
%%Import the data
fullFileName=sprintf('%s%d%s', 'C:\Users\Joseph\Documents\MATLAB\CS\CSV\',i, '.csv') ;
fid = fopen(fullFileName, 'rt');
M=textscan(fid,'%s','collectoutput',1,'headerlines',0);
fclose(fid);
X=M{1,1};
end
The issue is that X is a cell array in which the data is comma delimited. For instance the first two rows are the following: 1st row:
'CUSIP_ID,BOND_SYM_ID,COMPANY_SYMBOL,TRD_EXCTN_DT,TRD_EXCTN_TM,TRC_ST,ASCII_RPTD_VOL_TX,RPTD_PR,YLD_PT,DAYS_TO_STTL_CT,SALE_CNDTN_CD,SPCL_TRD_FL,DISS_RPTG_SIDE_CD,RPTD_HIGH_PR,HIGH_YLD_PT,RPTD_LOW_PR,LOW_YLD_PT,RPTD_LAST_PR'
2nd row
'00846UAG6,A.GF,A,1/3/2011,17:21:06,T,1700000,101.636,4.78396,0,A,,B,0,0,0,0,0'
The first row is the headers of the columns and the second row contains data. All I want is to create cell and numeric variables (depending on the type of data) where each variable has the name of the respective name in the headers and has the corresponding data from the rest of rows. i.e to create cell array called CUSIP_ID with the data {00846UAG6} and another vvector RPTD_PR=[101.636] etc...
is there a way to parce the data of X?
  댓글 수: 1
Jan
Jan 2012년 7월 8일
I do not understand the question. Would textscan(... 'delimiter', ',') solve the problem already?
Btw. it is called "parsing" with "s".

댓글을 달려면 로그인하십시오.

답변 (1개)

Walter Roberson
Walter Roberson 2012년 7월 8일
  댓글 수: 3
Jan
Jan 2012년 7월 8일
Is this really the same question as above?
C = {'CUSIP_ID', 'BOND_SYM_ID', 'COMPANY_SYMBOL');
FileName2 = ['Issuer' num2str(UIssuer(i))];
save(FileName2, C{:]});
Walter Roberson
Walter Roberson 2012년 7월 8일
편집: Walter Roberson 2012년 7월 8일
You wrote,
All I want is to create cell and numeric variables (depending on the type of data) where each variable has the name of the respective name in the headers and has the corresponding data from the rest of rows.
You are therefore asking to compute variable names. It is not a good idea to do that; there are many associated problems.
In your situation, I recommend using dynamic field names in a structure, and then saving with save() and the -struct flag.
The parsing is easy:
fieldnames = regexp( FirstRow, ',', 'split');
fieldvals = regexp( SecondRow, ',', 'split');
tempcell = [fieldnames; fieldvals];
savestruct = struct( tempcell{:} );
save( FileName, 'savestruct', '-struct');
The step that this misses is converting numeric-looking fields to numeric values. In order to do that, you have to know ahead of time which fields must be numeric, or you have to set rules about the forms that are okay to convert to numeric. Keep in mind as you construct those rules that some strings that contain the characters 'e', 'E', 'i', 'I', '-', '+' or '.' are considered to be convertible to numeric, so you can end up surprised if something you "know" should be a text field just happened to contain "E0", which is interpretable as "0E0" which is 0.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Characters and Strings에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by