What is the optimal block size for importing a big .csv with textscan()
조회 수: 2 (최근 30일)
이전 댓글 표시
I have .csv files as big as ~17GB and limited RAM (to 8GB), therefore I am importing them in blocks. I noticed that importing as much as I can (thus fewer iterations) is not optimal.
Follows the test on Win7 64 i7-2600 R2013a:
n = 50;
opt = {'Delimiter',',','HeaderLines',1};
N = [500 1e3 5e3 1e4:1e4:1e5 2e5:1e5:1e6];
t = zeros(n,numel(N));
for jj = 1:23;
disp(N(jj))
fid = fopen('C:\TAQ\8e1e9fb052f2b2b6.csv');
for ii = 1:n
tic
foo = textscan(fid, '%s%u32%u8:%u8:%u8%f32%u32%u16%u16%s%c', N(jj), opt{:});
t(ii,jj) = toc;
end
fclose(fid);
end
The results (y-seconds, x-number of lines imported):
QUESTION: Do you find these results unusual, and what might cause the substantial increase after 1e5? I/O buffer?
Note: consider that 1e6 lines is around ~40MB.
댓글 수: 0
답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Large Files and Big Data에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!