I have a set of csv files which have non-comma-delimited header and footer text (corporate boilerplate). The footer text seems to be preventing readtable from parsing the file, and as the number of rows of data vary from one csv to another, I can't hard-code the data ranges.
Is there a way to open such a file, find a spefic string therein, and use the row index thereof to bound readtable or similar functions? Alternatively, what would be the best way to go about extracting the comma-delimited data in this situation?

댓글 수: 5

dpb
dpb 2022년 6월 21일
It's certainly possible to scan the file to find the header/trailer lines -- whether that's best or not would depend upon the actual format and content of the files themselves.
Attach a sample that illustrates -- and how/what you tried.
See the attached for an example of the format from which I'm trying to extract data.
So far I only tried to use readtable, which threw a fault stating "All lines of a text file must have the same number of delimiters. Line 34 has 0 delimiters, while the preceeding lines have 28." Upon reading that, I started digging into the various import functions, however none of the function descriptions nor any of the other Q&A posts here on MATLAB Answers have addressed a way to avoid the footer text.
dpb
dpb 2022년 6월 21일
Attach a real data file, so can actually test reading it...
Like @dpb says, give a real data file with actual numbers in it instead of this bogus useless one. I was going to try importdata which normally gives separate fields for headers and numbers but with your fake csv it was basically garbage.
can you share the sample file to to test, so as to provide an proper suggestion

댓글을 달려면 로그인하십시오.

 채택된 답변

per isakson
per isakson 2022년 6월 22일
I've replaced "data" in your file by "3.14" and I've skipped the the two column header lines, because they are weird.
Here is an oldtimers solution. (The only problem is to get the number of columns right.)
fid = fopen( 'FormatExample.csv', 'r' );
cac = textscan( fid, '%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f' ...
, 'Headerlines',4, 'CollectOutput',true, 'Delimiter',',' );
[~] = fclose( fid );
cac
cac = 1×1 cell array
{30×29 double}
cac{1}(1:3,1:5)
ans = 3×5
3.1400 3.1400 3.1400 3.1400 3.1400 3.1400 3.1400 3.1400 3.1400 3.1400 3.1400 3.1400 3.1400 3.1400 3.1400
To make readtable read your file, you need to "fix" delimitedTextImportOptions - I guess.

댓글 수: 3

A useful idiom picked up from @Steven Lord many years ago for the proverbially ugly repeated format string running on interminably is
fmt=repmat('%f',1,29);
fid = fopen( 'FormatExample.csv', 'r' );
cac = textscan(fid, fmt,'Headerlines',4, 'CollectOutput',true, 'Delimiter',',' );
[~] = fclose( fid );
Of course, one still must find the number of columns/delimiters in the body of the file if there is no trail of bread crumbs to enable one to determine so from the header information in the file, or it is known a priori to be fixed.
We simply don't have sufficient information to go on well here...
Whelp, I feel rather foolish. Apparently all I had to do was run the detectImportOptions function and feed the result into readtable. I incorrectly assumed that readtable invoked detectImportOptions prior to trying to read the data. Thank you all for your help.
dpb
dpb 2022년 6월 22일
" I incorrectly assumed that readtable invoked detectImportOptions ..."
The readXXX family does some less sophisticated parsing than full-blown detectImportOptions does in an attempt to be more time-saving...of course, if it doesn't work, one ends up using a lot more time debugging and going back again...

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

dpb
dpb 2022년 6월 22일

1 개 추천

Another approach although with a real file so could figure out how to interpret what is actually the filel content and if had details on just what is needed/wanted -- for example, there are what appear to be both variables and units on the same record in the file if there is any truth at all in the headings -- but there are 13 variables and only 7 "Unit" indicators and 29 data fields/record. It's not, therefore, exactly possible to know what belongs with what -- 13 x 2 ==>26 so the number of variables plus a units field for each is short three columns.
As @per isakson hints, I'd begin with knowing what the content of the file is and use that as additional information for detectimportoptions to build a text file import object to use with one of the higher level routines like readtable if that were the appropriate data structure to use -- again, that would only be knowable from details of the file we don't have and what is to be done with the data once loaded.
But, to revert back to the original Q? posed, the rough outline to brute-force it would be something like
l=readlines('FormatExample.csv');
nHdr=find(startsWith('VariableName1'))+1;
nTrlr=find(count(l,',')==strlength(l),1)-1;
data=str2double(l(nHdr:nTrlr));

카테고리

도움말 센터File Exchange에서 Language Support에 대해 자세히 알아보기

제품

릴리스

R2019b

질문:

2022년 6월 21일

댓글:

dpb
2022년 6월 22일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by