Sift through large amounts of data from csv

How am I suppose to open a csv with large amount of data?
By the way I need to randomly select by certain options
Can anyone help me open the csv and teach me how to select randomly?

답변 (1개)

Shadaab Siddiqie
Shadaab Siddiqie 2021년 6월 17일

0 개 추천

From my understanding you want to import and operate on large csv file. If the csv file too large for your system to handle, I workaround would be that you can break it down into block, e.g. 500MB. Here is the code which might help you:
blockSize = 500e6 ; % Choose large enough so not too many blocks.
tailSize = 100 ; % Choose large enough so larger than one value representation.
% - Open file.
fId = fopen( 'largeFile.csv' ) ;
% - Read first line, convert to double, determine #columns.
line = fgetl( fId ) ;
data = sscanf( line, '%f,' ) ;
nCols = numel( data ) ;
lastBit = '' ;
while ~feof( fId )
% - Read and pre-process block.
buffer = fread( fId, blockSize, '*char' ) ;
isLast = length( buffer ) < blockSize ;
buffer(buffer==10) = ',' ;
buffer(buffer==13) = '' ;
% - Pre-pend last bit of last block.
if ~isempty( lastBit )
buffer = [lastBit; buffer] ; %#ok<AGROW>
end
% - Truncate to last ',' and keep last bit for next iteration.
if ~isLast
n = find( buffer(end-tailSize:end)==',', 1, 'last' ) ;
cutAt = length(buffer) - tailSize + n ;
lastBit = buffer(cutAt:end) ;
buffer(cutAt:end) = [] ;
end
% - Parse.
data = [data; sscanf( buffer, '%f,' )] ; %#ok<AGROW>
end
% - Close file.
fclose( fId ) ;
% - Reshape data vector -> array.
data = reshape( data, nCols, [] )' ;
There might be many other approaches, you can refer readmatrix and testscan for more information.

카테고리

도움말 센터File Exchange에서 Large Files and Big Data에 대해 자세히 알아보기

질문:

2021년 5월 6일

답변:

2021년 6월 17일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by