Cannot Load CSV file

Question

0 개 추천

Screen Shot 2018-07-31 at 19.52.46.png

I am trying to load a csv file using the import tool.

It takes forever (like a weekend was not enough...).

I've included the screenshot of what I am doing.

The file has numbers from H2 to AEQ639774. From A1 to AEQ1 I have headers. From A2 to G639774 I have identifiers.

I was trying to first load the numbers into a numeric matrix, and then repeat the process for headers and identifiers separately. But not even this works.

The file is 1.28 GB.. so big but not that big.

My machine has 16gb ram so that should be enough.

I am probably doing something wrong!

Thanks in advance!

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Adam Danz 2018년 7월 31일

0 개 추천

That sounds fishy. What version of matlab are you using? I assume the problem persists after exiting and rebooting Matlab.

You could try rehashing the toolbox cache in case 3rd party toolboxes are interfering.

You could use an alternative method of importing the data such as xlsread() which bypasses some of the processing done by the import tool.

댓글 수: 15
이전 댓글 13개 표시 이전 댓글 13개 숨기기

romulo alves 2018년 7월 31일

편집: romulo alves 2018년 7월 31일

MATLAB Online에서 열기

So, if I do xlsread('DOT.csv','H7:T20'), trying to extract only a little bit of numeric part, I get the message

Unable to read XLS file "path" File is not in recognized format.

If I do:

chunk_nRows = 2e4 ;
 % - Open file.
 fId  = fopen( 'DOT.csv' ) ;
 % - Read first line, convert to double, determine #columns.
 line  = fgetl( fId ) ;
 row   = sscanf( line, '%f,' )' ;
 nCols = numel( row ) ;
 % - Prealloc data, copy first row, init loop counter.
 data      = zeros( chunk_nRows, nCols ) ;
 data(1,:) = row ;
 rowCnt    = 1 ;
 % - Loop over rest of the file.
 while ~feof( fId )
    rowCnt = rowCnt + 1 ;
    % - Realloc + a chunk if rowCnt larger than data array.
    if rowCnt > size( data, 1 )
        fprintf( 'Realloc ..\n' ) ;
        data(size(data, 1)+chunk_nRows, nCols) = 0 ;
    end
    % - Read line, convert and store.
    line = fgetl( fId ) ;
    data(rowCnt,:) = sscanf( line, '%f,' )' ;
 end
 % - Truncate data to last row (truncate last chunk).
 data = data(1:rowCnt,:) ;
 % - Close file.
 fclose( fId ) ;

I get the message

Subscript indices must either be real positive integers or logicals.

I checked and the code stops when

rowCnt = 20001

Walter Roberson 2018년 7월 31일

The 'e' and 'r' are probably the reason that most numbers are coded as if they are strings.

What do you want done with the 'e' and 'r' ? Is it okay to treat both of them the same way as empty cells, by changing all three of them into NaN ?

Walter Roberson 2018년 8월 1일

The file turns out to be UTF8 encoded, because it contains accented characters at various points. That leads to some problems.

I started working with reading in the entire file at one time to process as a single string (there can be a lot of advantages to working that way), but I encountered a Mathworks bug with native2unicode at the point of 1 gigabyte of decoded characters.

댓글을 달려면 로그인하십시오.

Cannot Load CSV file

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 15
이전 댓글 13개 표시 이전 댓글 13개 숨기기

추가 답변 (0개)

카테고리

태그

Community Treasure Hunt

Cannot Load CSV file

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 15 이전 댓글 13개 표시 이전 댓글 13개 숨기기

추가 답변 (0개)

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 15
이전 댓글 13개 표시 이전 댓글 13개 숨기기