Get every nth row of a tall array

조회 수: 5 (최근 30일)
Dan Houck
Dan Houck 2022년 8월 18일
댓글: dpb 2022년 8월 22일
I have a tall array and would like to collect every 26th row of one variable into an array. I tried:
U = tall(udata);
hhws = [];
udata.ReadSize = 26*500; % data is in 26 row chunks, so sizing so below works
while hasdata(udata)
U = read(udata);
hhws = [hhws;U.Var13(14:26:end)]; % want every 26th row starting with the 14th row
end
This produced the error:
Error using matlab.io.datastore.TabularTextDatastore/readData (line 78)
Unable to parse a "Numeric" field when reading row 10765, field 1.
Actual Text: "******** 7.909"
Expected: A number or literal "NaN", "Inf". (possibly signed, case insensitive)
Error in matlab.io.datastore.TabularDatastore/read (line 174)
[t, info] = ds.readData();
Caused by:
Reading the variable name 'Var1' using format '%f' from file: '<file path and file name>' starting at offset 1011702139.
Seems like maybe there's a problem with how I'm reading the file in? Is the method above viable assuming I get through this error? Thanks!

채택된 답변

dpb
dpb 2022년 8월 18일
편집: dpb 2022년 8월 20일
Actual Text: "******** 7.909"
The problem is in the data file itself -- there's an oveflow field indicator of "*" in a numeric field that fails because can't be converted to a numeric value by a formatted read.
You would need to add
'TreatAsMissing',{'********',''}
to the datastore when create it.
I've not really used the datastore much; I didn't see it there, but with detectImportOptions and the resulting text import object, there's also an 'ImportErrorRule' parameter that can be used to substitute a 'FillValue' which in that case could be made to return inf instead of nan to identify the specific instances as being the overflow and leave the missing just as empty. Seems an oversight unless I just missed it in the doc, but surely didn't find it; the options available aren't as extensive for the datastore, it seems.
  댓글 수: 4
Dan Houck
Dan Houck 2022년 8월 22일
Got it to work! Just had to change 'TreatAsMissing',{'********',''} to just 'TreatAsMissing','********', though I don't understand why that made the difference.
dpb
dpb 2022년 8월 22일
That does seem peculiar; the empty record is default; it's supposed to use either.
That might be worth a support Q? to TMW to ask if that is an expected result.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Large Files and Big Data에 대해 자세히 알아보기

제품


릴리스

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by