Datastore readsize - unexpected behavior

조회 수: 9 (최근 30일)
Anders
Anders 2023년 6월 23일
댓글: Rik 2023년 6월 23일
I would expect the code below to read 40k lines from my datastore at each pass but for reasons unkown to me the number of lines varies between the passes.
ds = tabularTextDatastore(filename,'ReadSize',40000);
c = 0;
while hasdata(ds)
c = c + 1;
TT = read(ds);
T = height(TT);
if c==1
t_total = T;
else
t_total = t_total + T;
end
disp("Done with " +t_total +" ticks.")
end
This procedes the output :
Done with 40000 ticks.
Done with 45096 ticks.
Done with 85096 ticks.
Done with 90190 ticks.
Done with 130190 ticks.
I would expect the increment to be 40k each time. The data is timestamped and based on the timestamp the data in the csv file "filename" does not seem to be corrupt in any way. That is, there are no missing timestamps when reading the data. Is there anything I can do so that I will get 40k lines at each pass (except the last pass of course) ?.
  댓글 수: 3
Anders
Anders 2023년 6월 23일
Sorry, I should have been more careful with the code example. Fixed that now. The actual data I'm using is proprietary so I'm not allowed to share it. Would it be helpful with an example file with the same structure?
Rik
Rik 2023년 6월 23일
Anything that reproduces this problem is fine. You care about the actual data, we don't. For this problem, the only thing that matters is that the data produces the same results.

댓글을 달려면 로그인하십시오.

답변 (1개)

Sanskar
Sanskar 2023년 6월 23일
Hi Anders!
What I understand from your question is that you want to read 40k lines from your datastore but you are getting random lines after first iteration of the loop.
'ReadSize' property which you are using call to read at most number of rows which is given as argument.
But 'hasdata' function doesn't guarantee that exactly 'ReadSize' number of rows will be passed.
Instead of 'hasdata' you can use 'isDone()' to check if all the data has been read from dataset.
Following is the modified code:
ds = tabularTextDatastore(filename, 'ReadSize', 40000);
c = 0;
while ~isDone(ds) % Use isDone instead of hasdata
c = c + 1;
if c == 1
t_total = T;
else
t_total = t_total + T;
end
data = read(ds); % Read exactly 40,000 lines at each pass
disp("Done with " + t_total + " ticks.")
end
Following are the link of dcumentation for isDone():
  댓글 수: 1
Anders
Anders 2023년 6월 23일
편집: Anders 2023년 6월 23일
Hi Sanskar,
I get an Unrecognized function or variable 'isDone'. Is isDone part of some toolbox? When I type which isDone I get a 'not found' message.
If I understand the documentation correctly isDone is used for system objucts and cannot be used with datastores.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Data Import and Analysis에 대해 자세히 알아보기

제품


릴리스

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by