Read a column which has comma seperated values

조회 수: 17 (최근 30일)
Ratna
Ratna 2012년 4월 8일
Hi,
I have a .csv file which contains 5 columns - the first represents time, second to fifth columns are numbers, among the these columns the third column contains number separated with 'comma'. I was able to read the data, if there is no comma, but how to read the data with comma??
My .csv looks like this:
0:00:02 6 984 186.8 11.76
0:00:03 7 984 186.8 11.76
0:00:04 7 984 186.8 11.76
0:00:05 7 1,123 183.2 10.2
0:00:06 6 1,123 183.2 10.2
0:00:07 6 1,123 183.2 10.2
0:00:08 4 1,123 183.2 10.2
Here is my code to read the .csv file:
fid = fopen('file.csv');
allData = textscan(fid,'%s %d %d %d %*d','delimiter',',','collectoutput',1,'headerlines',1);
fclose(fid);
My code was able to read the first three line only.
Is there a way to read the comma separated values using textscan??
  댓글 수: 2
per isakson
per isakson 2012년 4월 9일
I'm stubborn. My code below reads all seven lines! The ",", which I'm convinced is a "thousand separator", must be handled by special code.
per isakson
per isakson 2012년 4월 9일
textscan reads "comma separated values" without problems. However, your example is not "comma separated values". The extension "csv" doesn't change that.

댓글을 달려면 로그인하십시오.

답변 (3개)

per isakson
per isakson 2012년 4월 8일
The problem is most likely a 'thousand separator'. "[...]contains number separated with 'comma'." That is not two number separated by 'comma', but one number larger the one thousand.
0:00:02 6 984 186.8 11.76
0:00:03 7 984 186.8 11.76
0:00:04 7 984 186.8 11.76
0:00:05 7 1,123 183.2 10.2
0:00:06 6 1,123 183.2 10.2
0:00:07 6 1,123 183.2 10.2
0:00:08 4 1,123 183.2 10.2
See my answere to Data extraction from .txt file. This provides three alternative solutions. The last one offers the opportunity to operate on the cell array, cac, before parsing. Add the line
cac = strrep( cac{:}, ',', '' );
to remove the "thousand separator" (','). This illustrates one way to do it. You will need to edit the code. This smells a bit, but I don't think there is a better way.
--- EDIT ---
Your sample text contains three different separators - I assume
  1. decimal separator, "."
  2. thousand separator, ","
  3. list separator, " " --- EDIT --- 'Delimiter'
Matlab don't recognize the "thousand separator" regardless of what character is used to represent it. It has to be removed.
In addition "0:00:02" causes problems. There is only one character for hour.
This code reads the text and parses the numeric data. Time must be parsed in some dirty way. How come it is allowed to use a format like this.
fid = fopen( '.\cssm.txt', 'r' );
cac = textscan(fid,'%s','Whitespace','',Delimiter','\n','Headerlines',1 );
sts = fclose( fid );
cac = cac{:};
cac = strrep( cac, ',', '' ); % remove the "thousand separator"
cac = strtrim( cac ); % not needed(?)
str = transpose( char( cac ) );
str = cat( 1, str, repmat( ' ', 1, size(str,2) ) );
ca2 = textscan( str, '%s%f%f%f%f' );
num = [ ca2{2:end} ];
  댓글 수: 6
per isakson
per isakson 2012년 4월 9일
Two different types of separators cannot be represented with the same character. Matlab cannot handle that. Since "dot" is used as decimal separator it cannot be used as list separator in the same file.
If there is one header line in the file add "'Headerlines', 1" to the first textscan call.
I cannot believe that "1,123" should be interpreted as two values!
per isakson
per isakson 2012년 4월 9일
I seems impossible to use continuation lines in the code part. "..." just disappears

댓글을 달려면 로그인하십시오.


Image Analyst
Image Analyst 2012년 4월 8일
Why not use a space as the delimiter? Or alternatively, try dlmread() with a space as the delimiter? It might have trouble with the colons in the time though (not sure).
  댓글 수: 3
Image Analyst
Image Analyst 2012년 4월 8일
So did you pass it a space: ' '??? Or did you pass it no characters or multiple characters?
Ratna
Ratna 2012년 4월 9일
I passed as space ' ' and also with ''

댓글을 달려면 로그인하십시오.


Peter Wittenberg
Peter Wittenberg 2012년 4월 8일
What I have done for similar circumstances (and I have seen even worse) is to use the importdata function. Now importdata will read your whole set of data in as a string
myData = importdata(fullfile(mypathname, fileToBeConverted),',');
What happens now is that you have a big set of data. Now
remain = myData(2:end,:)
will cut out a single header line (change it if you have more). Now it is time to parse the data.
[dateStrin, remain] = strtok(remain,','); %#ok<*STTOK>
time=datenum(dateStrin,'HH:MM:SS.FFF');
What this does is to pull in your datestring with all those ugly colons and format it into a MATLAB date string. You'll need to adjust it for the day on which your events occurred and you may need to fiddle with it a bit more. Now we need to look out for all those other variables. We do that with strtok. The last set of code already removed the data string from remain, so we're ready to move ahead.
[myfirstnumvar, remain] = strtok(remain,',');
firstvar=str2num(char(myfirstnumvar)); %#ok<*ST2NM>
Now we've done the same thing with the first numeric variable. We go ahead like this for each of the columns that you have. while this is a modification from code I have actually written and successfully used, you will proba bly need small modifications to get yours to work The basic concept is to take the whole mess in as a string and then go after each column separately. You will need to remove the comma from one of your string variables. Some might ask why I produced something elaborate using successive string tokens. The basic reason is that the formats of the data changed (even worse than yours), so I need to read in the data as strings and have the flexibility to make my changes to each variable for its format. It's not really that hard to do wthis way. You are beyond using MATLAB's automated data reading to read in your whole file without incident, so this approach takes everything as strings and allows you to do your own parsing differently for each variable.
  댓글 수: 2
Ratna
Ratna 2012년 4월 9일
my data has text in the first line, import data won't work
per isakson
per isakson 2012년 4월 9일
How should we know? Add "'Headerlines', 1" to the first textscan call.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Text Data Preparation에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by