parcing comma delimited column to multiple vectors and cell arrays
    조회 수: 4 (최근 30일)
  
       이전 댓글 표시
    
Hi,
I am importing a series of CSV files of 18 columns each with different row sizes (can be up to 800,000 rows) using teh following codes
 for i=1:135
  %%Import the data
  fullFileName=sprintf('%s%d%s', 'C:\Users\Joseph\Documents\MATLAB\CS\CSV\',i, '.csv') ;
  fid = fopen(fullFileName, 'rt');
  M=textscan(fid,'%s','collectoutput',1,'headerlines',0);
  fclose(fid);  
  X=M{1,1}; 
  end
The issue is that X is a cell array in which the data is comma delimited. For instance the first two rows are the following: 1st row:
'CUSIP_ID,BOND_SYM_ID,COMPANY_SYMBOL,TRD_EXCTN_DT,TRD_EXCTN_TM,TRC_ST,ASCII_RPTD_VOL_TX,RPTD_PR,YLD_PT,DAYS_TO_STTL_CT,SALE_CNDTN_CD,SPCL_TRD_FL,DISS_RPTG_SIDE_CD,RPTD_HIGH_PR,HIGH_YLD_PT,RPTD_LOW_PR,LOW_YLD_PT,RPTD_LAST_PR'
2nd row
'00846UAG6,A.GF,A,1/3/2011,17:21:06,T,1700000,101.636,4.78396,0,A,,B,0,0,0,0,0'
The first row is the headers of the columns and the second row contains data. All I want is to create cell and numeric variables (depending on the type of data) where each variable has the name of the respective name in the headers and has the corresponding data from the rest of rows. i.e to create cell array called CUSIP_ID with the data {00846UAG6} and another vvector RPTD_PR=[101.636] etc...
is there a way to parce the data of X?
댓글 수: 1
  Jan
      
      
 2012년 7월 8일
				I do not understand the question. Would textscan(... 'delimiter', ',') solve the problem already?
Btw. it is called "parsing" with "s".
답변 (1개)
  Walter Roberson
      
      
 2012년 7월 8일
        Please do not proceed that way.
댓글 수: 3
  Jan
      
      
 2012년 7월 8일
				Is this really the same question as above?
C = {'CUSIP_ID', 'BOND_SYM_ID', 'COMPANY_SYMBOL');
FileName2 = ['Issuer' num2str(UIssuer(i))];
save(FileName2, C{:]});
  Walter Roberson
      
      
 2012년 7월 8일
				
      편집: Walter Roberson
      
      
 2012년 7월 8일
  
			You wrote,
All I want is to create cell and numeric variables (depending on the type of data) where each variable has the name of the respective name in the headers and has the corresponding data from the rest of rows.
You are therefore asking to compute variable names. It is not a good idea to do that; there are many associated problems.
In your situation, I recommend using dynamic field names in a structure, and then saving with save() and the -struct flag.
The parsing is easy:
fieldnames = regexp( FirstRow, ',', 'split');
fieldvals = regexp( SecondRow, ',', 'split');
tempcell = [fieldnames; fieldvals];
savestruct = struct( tempcell{:} );
save( FileName, 'savestruct', '-struct');
The step that this misses is converting numeric-looking fields to numeric values. In order to do that, you have to know ahead of time which fields must be numeric, or you have to set rules about the forms that are okay to convert to numeric. Keep in mind as you construct those rules that some strings that contain the characters 'e', 'E', 'i', 'I', '-', '+' or '.' are considered to be convertible to numeric, so you can end up surprised if something you "know" should be a text field just happened to contain "E0", which is interpretable as "0E0" which is 0.
참고 항목
카테고리
				Help Center 및 File Exchange에서 Characters and Strings에 대해 자세히 알아보기
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


