Skip Lines (other than the Header) when Importing CSV File
이전 댓글 표시
I have read a couple of entries about skipping header information when importing CSV files. While I don't fully understand them yet, I know that I'll also need to skip lines with text interspersed in with my data as well. How would I import a CSV file that has a Header but also includes lines of text between "blocks" of data? For instance, in the attached file, Lines 1-45 can be considered the "Header" and are easily skipped over. Lines 46-74 contain the actual data... skipping Lines 75-76... and then Lines 77-105 contain the next "block" of data. This pattern repeats and, depending on the length of file to be handled, could repeat a couple of thousand times (meaning could have around 2K "blocks" of data). I would like to be able to import the data blocks only so that I can do math (summing, averaging, max and min values) for specific "blocks" of data... I could do this in Excel, but I don't know how to automate the process without using Matlab. Any suggestions would be appreciated. Thank you.
답변 (2개)
per isakson
2014년 2월 10일
편집: per isakson
2014년 2월 11일
Are there any string values,which can be used as "Begin" and "End" of the blocks?
.
[The following day]
Try this
str = fileread('cssm.txt');
look_behind = '(?<=Frame \d{1,3}\s*\n)';
look_ahead = '(?=(\s*Frame \d{1,3}\s*)|(\s*$))';
expr2match = '[0-9\.\s]+?';
cac = regexp( str, [look_behind,expr2match,look_ahead], 'match' );
cac{3}
where cssm.txt contains
Frame 1
11.1 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
Frame 2
22.2 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
Frame 99
33.3 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
returns
ans =
33.3 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
.
To understand might take hours (or more) of reading and experimenting with regular expressions, especially "Lookaround Assertions" . However, it is worth the effort.
.
Use textscan to convert the text to numeric
buf = textscan( transpose(cac{3}), '%f%f%f%f', 'CollectOutput',true );
and
>> buf{1}
ans =
33.3000 2.0000 3.0000 13.0000
5.0000 11.0000 10.0000 8.0000
9.0000 7.0000 6.0000 12.0000
4.0000 14.0000 15.0000 1.0000
댓글 수: 3
per isakson
2014년 2월 10일
편집: per isakson
2014년 2월 11일
Here is an alternative value of look_behind, which is 'cleaner':
look_behind = '(?<=Frame \d{1,3}\s+)';
I had problems to make \d{1,3} match as many digit as possible, i.e. make it greedy. Next try
look_behind = '(?<=Frame \d++\s+)';
\d++ stands for all consecutive digits that there are (at that position)
per isakson
2014년 2월 10일
편집: per isakson
2014년 2월 11일
I didn't study the answer of Kelly. However, length is a function of Matlab
카테고리
도움말 센터 및 File Exchange에서 Large Files and Big Data에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!