how to use readtable with excel spreadsheets, skip extra header lines, and read to "end"?
조회 수: 61 (최근 30일)
이전 댓글 표시
readtable apparently doesn't support "HeaderLines" option for xls files. From documentation, it does support reading ranges like
'Range','D2:H4'
But I can't find the magic format to get it to read to an arbitrary 'end'.
'Range','D2:end'
doesn't work, nor any obvious variations, such as
'Range','D2:M.end' .
I have a set of data files with arbitrary lengths and arbitrary # of columns, which have 4 header lines. I'd like to read them into tables, picking up the first line as column labels, skipping lines 2-4, then reading the rest of the file.
T = readtable('plot 1.xlsx');
results in 1st line used as headers (good), but text of headers in lines 2-4 cause all remaining (numeric) data to be read in as text, not numeric data (bad).
Anybody got a working solution to this problem? Am running matlab 2015a under OS X 10.10 (Yosemite).
댓글 수: 3
Tasio Oderkerk
2016년 5월 19일
Same issue, Matlab 2016, removed header lines by hand.
If anyone has a better solution, please resp.
Bridget Eckhardt
2017년 6월 8일
편집: Bridget Eckhardt
2017년 6월 8일
We should be able to use the following, but I still get a problem with mine. It seems to randomly choose which is the header line even though I have specified which it is. However, the data range specification does work. So, this starts identifying data at A4.
opts=detectImportOptions('Test.xlsx');
opts.VariableNamesRange='A3';
opts.DataRange='A4';
답변 (1개)
Jeremy Hughes
2017년 6월 9일
Hi Ian, Without the actual file you're trying to read it's hard to say for sure, but I think I can help. Header lines are assumed to be at the beginning of the file, so since your first row contains the variable names, you really don't have any header lines (according to READTABLE's definition). Unfortunately, if the 'HeaderLines' parameter worked with spreadsheet files, based on your description of the file, it wouldn't help.
However in R2016b, DETECTIMPORTOPTIONS does allow a nearly-identical parameter 'NumHeaderLines', and using import options, you can set a starting cell for your DataRange. It will also attempt to get a best guess at which row is really data.
Try this:
opts = detectImportOptions('Test.xlsx','NumHeaderLines',0);
You can inspect the results to see if it matches your expectation, and it this didn't get exactly what you wanted:
opts.VariableNamesRange = 'A1';
opts.DataRange = 'A4';
Now you just need to tell READTABLE to use the options.
T = readtable('Test.xlsx',opts,'ReadVariableNames',true)
You might not need the ReadVariableNames parameter; READTABLE prefers to use the variable names in OPTS, and that works okay so long as DETECTIMPORTOPTIONS got them right. DETECTIMPORTOPTIONS will not set a VariableNamesRange/Line if the first row (after the detected or prescribed number of header lines) contains the same datatypes as the other rows--it assumes they are part of the data and that there are no variable names in this file.
I hope this helps. If you still have an issue, upload an example file and I'll see what I can do. Jeremy
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Spreadsheets에 대해 자세히 알아보기
제품
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!