How to solve problem with smaller number of records read from datastore ?
조회 수: 4 (최근 30일)
이전 댓글 표시
I am using datastore to read data from csv file which has over 7 million records. The problem occurs when I set ReadSize field to 500000 or 1000000 and after making single read I get only about 100000 records. I get this issue in Matlab 2014b and in 2015a. Where could be the problem?
댓글 수: 0
답변 (2개)
Omar Qallaa
2017년 2월 9일
편집: Omar Qallaa
2017년 2월 9일
I have a very similar problem. Similar to this case, I'm reading a very large single csv file that has ~100M records of just one feature. Setting "ReadSize" does not guarantee that the number of returned records after "read" is constant. I solved this as follows:
while hasdata(ds)
data = read(ds);
if (size(data,1) ~= requiredSamples) && hasdata(ds)
% Change read size to number of missing samples
ds.ReadSize = requiredSamples - size(data,1);
tmp = read(ds);
data = vertcat(data,tmp);
% Set read size back to requiredSamples
ds.ReadSize = requiredSamples;
end
end
댓글 수: 3
Walter Roberson
2019년 4월 3일
The person had wanted to get a full batch of size ReadSize, and process that batch, and then move on to the next batch as long as there was data. The person did not want to accumulate all of the data for the entire file. The assignment of the read(ds) is overwriting, but overwriting in the context of being the start of a batch, at a time when they want the entire batch to be overwritten with ReadSize new records.
thomassimm
2019년 4월 4일
Ok,
The 100M confuses me because the max data I can get is ~30k lines, presumably due to the 32 MB mentioned below. So if requireedSample>30000*2 then it won't work for me. Mine are 80k+ lines.
Aaditya Kalsi
2015년 12월 15일
I believe that datastore has an upper limit of the amount of data read from a file at once. I believe you are running up against this. Also, the 'ReadSize' property is an upper limit, so technically, this could be expected behaviour.
Is there a reason you require the exact number of rows? Does your algorithm depend on this?
댓글 수: 4
Aaditya Kalsi
2015년 12월 17일
편집: Aaditya Kalsi
2015년 12월 17일
I believe that is expected. My suspicion is that you are reading multiple files and each file has only up to about a 1000 records, or you have a large file where each record is very large.
I'm quoting the documentation here:
ReadSize — Amount of data to read
20000 (default) | positive scalar | 'file'
Amount of data to read in a call to the read function, specified as a positive scalar or the string, 'file'.
If ReadSize is a positive integer, then each call to read reads up to the specified number of rows from the datastore.
If ReadSize is 'file', then each call to read reads all of the data in one file.
When you change ReadSize from a numeric scalar to 'file' or vice versa, MATLAB resets the datastore to the state where no data has been read from it.
참고 항목
카테고리
Help Center 및 File Exchange에서 Spreadsheets에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!