Read csv file containing text and numbers for very large dataset (beyond xls limit)
조회 수: 4 (최근 30일)
이전 댓글 표시
I’m trying to import a very large dataset into matlab from a csv file. The file contains a mixture of numerical and string data. An example of the rows is below:
-15.37 32.83 408.08 1064 -2.35 2.913 -2.31E-05 1E+11
-15.19 -3.624 409.38 1083 -9.81 3.480 4.23E-05 undefined
-15.95 8.534 291.05 993 -133.1 6.866 -2.42E-03 undefined
-15.41 6.975 697.38 686 102.9 8.746 6.24E-03 2.4E+09
I want to import all the data and either replace undefined values with NaNs or remove the row completely containg undefined values. The csvread function in Matlab expects only numerical values so doesn’t work with this dataset and xlsread will only read a finite number of rows and this dataset is beyond that excel row limit so means some datapoints are not read. I’ve tried using importdata but this stops reading after the first row with an undefined value.
I have been able to find a workaround using readtable and then table2array and str2double however this is proving to be very time consuming. For a dataset of around 1.1 million rows it takes 4/5 minutes to read the data into Matlab compared to csvread which will take seconds (if all data is numeric). I’m wondering if anyone knows of a faster way to read in the csv mixed datafile as a matrix.
댓글 수: 7
답변 (1개)
Jan
2019년 4월 9일
편집: Jan
2019년 4월 10일
Please mention, what "very large" means. Some people call a file with 1MB "large" already, because they cannot read it anymore, others consider files under 500TB as small, because this is the size of their scratch disk. Does your file have some 100 MB? Then:
C = fileread(FileName);
C = strrep(C, 'undefined', 'NaN');
D = sscanf(C, '%g ', Inf);
By the way, text files are useful, if they are read or edited by human. Therefore huge text files are a design fail.
댓글 수: 2
Jan
2019년 4월 10일
If you have not used fileread before, this is the first time, you use it.
While fileread replies a char vector, the sscanf should create a double array. If you want to reshape it, either use the reshape command:
D = reshape(C, [], 8);
or do this inside sscanf already:
D = sscanf(C, '%g ', [8, Inf]);
참고 항목
카테고리
Help Center 및 File Exchange에서 Text Files에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!