I have a ASCII data file with unknown number of columns and rows. In the file there are some unwanted text lines on top. I want to read only some of the columns in the data set with the header by removing the text lines on top. I appreciate if anyone can help.

>> doc textscan and use the parameter |HeaderLines| to skip the header. Supposing that you have 7 header lines and that you need columns 1 and 3 (discarding the rest of each line), you should have something like: content = fileread( 'myData.txt' ) ; data = textscan( content, '%f%*f%f%*[^\n]', 'HeaderLines', 7, ... 'CollectOutput', true ) ; data = data{1} ; where you see a |*| in the _formatSpec_ argument to skip column 2 and |%*[^\n]| at the end to skip the rest of the line.

How can I extract data from a large ASCII file?

Cedric 2014년 5월 20일

By T and Z you mean columns 20 and 26 (like in Excel), or T_fit and another column whose header contains Z?

anton fernando 2014년 5월 20일

편집: anton fernando 2014년 5월 20일

I need to extract the H2o and O3 columns. it is 4th and 6th column in the data set that I posted.

Cedric 2014년 5월 20일

편집: Cedric 2014년 5월 20일

Do you need to identify that these are columns 4 and 6 programmatically (based on pattern matching in the header), or are H2O and O3 always columns 4 and 6 (in all files)?

Cedric 2014년 5월 20일

편집: Cedric 2014년 5월 20일

MATLAB Online에서 열기

If these are always columns 4 and 6, you can use the following formatSpec (2nd argument in the call to TEXTSCAN):

'%*f%*f%*f%f%*f%f%*[^\n]'

where the %*f sequences skip (float) columns 1,2,3,5, the %f sequences read (float) columns 4 and 6, and the %*[^\n] sequence skips the rest of lines. Note that I cannot determine based on your picture whether there are 6 or 7 header lines, so you'll have to adjust this number.

anton fernando 2014년 5월 20일

편집: anton fernando 2014년 5월 20일

Actually your code worked. Thank you very much. How can I read the data set when I don't know the column number and only when I know the column heading? Thank you again. You saved me lot of time from wasting. You are the best.

Cedric 2014년 5월 20일

See my last comment for columns 4 and 6.

If you are able to build the formatSpec based on the a priori knowledge of which columns (number) you need, the approach above is by far the easiest way to process your file(s).

If you need to find programmatically columns number based on header names 'H2O' and 'O3' (and others), it gets significantly more complicated. If you really need this feature, I can extend my solution though; just let me know.

anton fernando 2014년 5월 20일

Hey, I can understand what you have done here now. But there is a problem. I don't get data as they are in the data file. Some data read as 0. I want to get data in (0.350E+13) format. Also what ('CollectOutput', true) part does in your code? Thank you. P.S: I think I can read data by the column number. All data are in same column in each file. Thank you again.

Cedric 2014년 5월 20일

편집: Cedric 2014년 5월 20일

My guess is that the code reads data correctly, but you don't display them correctly afterwards. For example, if you just type data, MATLAB will factor 1.0e+12, and display a lot of 0.0000 which are in fact non-zero. Try to access one of these elements direct, e.g. data(10,1). If it isn't that, could you attach (icon staple) one of these files instead of a screenshot so I can try? If files are large, truncate one to e.g. 50 rows.

The CollectOutput parameter groups columns in data{1} (in this case). Without it, you get one cell per column, which is more difficult to use then (you have to merge cells content manually afterwards).

anton fernando 2014년 5월 20일

The file is attached.

Cedric 2014년 5월 20일

편집: Cedric 2014년 5월 20일

MATLAB Online에서 열기

Yep, that's it. You probably updated the example to

 data = textscan( content, '%*f%*f%*f%f%*f%f%*[^\n]', 'HeaderLines', 12, ...
                  'CollectOutput', true ) ;
 data = data{1} ;

Now you can see your data e.g. per row, and you'll see that they are non-zero:

 >> data(50,:)
 ans =
   1.0e-03 *
    0.5940    0.0073

Optionally, if you really wanted to display the whole array and see non-zero entries, you could type

>> format long

Then ..

 >> data
 data =
0e+02 *
009380000000000  -9.990000000000000
008250000000000  -9.990000000000000
007240000000000  -9.990000000000000
006350000000000  -9.990000000000000
005550000000000  -9.990000000000000
004840000000000  -9.990000000000000
004200000000000  -9.990000000000000
003630000000000  -9.990000000000000
003130000000000  -9.990000000000000
002670000000000   0.000000614000000
002280000000000   0.000000292000000
001950000000000   0.000000110000000
001680000000000   0.000000076300000
001440000000000   0.000000035600000
001230000000000   0.000000047500000
001060000000000   0.000000042100000
000906000000000   0.000000042300000
000775000000000   0.000000042200000
000664000000000   0.000000043400000
000567000000000   0.000000045600000
000484000000000   0.000000047000000
000413000000000   0.000000048500000
000353000000000   0.000000049900000
000301000000000   0.000000051300000
000256000000000   0.000000052600000
   ...

but again, the fact that you displayed a lot of zeros initially is just a display artifact (truncation), so there is no need to set the display format to long if all you need is to compute with these numbers afterwards.

anton fernando 2014년 5월 20일

Thank you very much. One last question. if I need the data of the 80th column do I have to write %f 80 times?

Cedric 2014년 5월 20일

편집: Cedric 2014년 5월 20일

MATLAB Online에서 열기

You're welcome. Interestingly, REPMAT works with strings as well (as with numeric/cell arrays), so no, if you needed columns 4, 6, and 80, you could do something like (not tested, you may have to check and adapt/correct):

 formatSpec = ['%*f%*f%*f%f%*f%f', repmat('%*f', 1, 73), '%f%*[^\n]'] ;
 data = textscan( content, formatSpec, 'HeaderLines', 12, ...
                  'CollectOutput', true ) ;
 data = data{1} ;

anton fernando 2014년 5월 20일

Thank you. I really appreciate your help.

Cedric 2014년 5월 20일

My pleasure!

How can I extract data from a large ASCII file?

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 14
이전 댓글 12개 표시 이전 댓글 12개 숨기기

추가 답변 (0개)

카테고리

태그

Community Treasure Hunt

How can I extract data from a large ASCII file?

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 14 이전 댓글 12개 표시 이전 댓글 12개 숨기기

추가 답변 (0개)

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 14
이전 댓글 12개 표시 이전 댓글 12개 숨기기