Ignoring header/footer in textfile question
조회 수: 5 (최근 30일)
이전 댓글 표시
Hello,
For the past week Ive been trying to open multiple text files that have different headers/footers at the same time. And ignoring all headers/footers and just extracting the data.Without knowing what the headers/footers are.The only thing I know is that the headers/footers always start with a char and form a string.
All headers/footers start with a char, examples:
File 1:
Line 1 of file - Samplerate : 100000
Line 2 of file - Bitspersample: 12
Rest of lines - data(2000 samples,floats)
File 2:
Line 1 of file - Bitspersample: 32
Line 2 of file - Normalized: FALSE
Lines 3-2500 - data(2500 samples,floats)
Line 2501 of file - Channel: A
Is there a way to ignore all lines of a text file that start with a char/string?
댓글 수: 0
답변 (1개)
Walter Roberson
2020년 1월 29일
fileread() the file.
regexprep() pattern '^\s*[^0-9+.-].*$' replacement '' (the empty string) with 'lineanchors' option. This will zap the content of lines whose first non-whitespace character is not a digit or + or - or period. If your data never has leading + on the numbers then do not include the + in the pattern. If your data never has numbers that start with period without leading 0 then do not include period in the pattern. This is the question of whether a number like .5 can occur or if would be 0.5.
In the case where your data never has leading + or - or period then instead of the pattern I showed, you can use '\s*\D.*$'
After the regexprep, textscan() the string.
댓글 수: 2
Walter Roberson
2020년 1월 29일
regexprep(str, '^\s*[^0-9+-].*$', '', 'lineanchors', 'dotexceptnewline')
[] means aany one character chosen from the list inside of the [] except when the first thing inside the [] is ^ in which case it means any one character that is NOT one of the listed ones. So the construct matches any one character that is NOT 0123456789 or + or - . In short you are looking for lines in which the first nonblank character is something that cannot possibly be forming a number.
The .* after that with the dotexceptnewline option matches to the end of the same line. When you find such a line you replace it with emptiness (but without removing the newline character itself) so you get an empty line in place of any line that starts with a non-number
참고 항목
카테고리
Help Center 및 File Exchange에서 Large Files and Big Data에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!