textscan does not read all rows

조회 수: 8 (최근 30일)
Ricardo Lopez A.
Ricardo Lopez A. 2020년 1월 19일
편집: Jeremy Hughes 2020년 1월 20일
Hi,
I am dealing with very large .txt files and trying to use textscan to open them. I have a a smaller .txt file with the same format that I was able to open with readtable. The resulting table has 47 variables and 1389712 rows.
Here is readtable code:
data=readtable('Building.txt');
Here is the textscan code:
formatSpec='%s%f%s%s%s%s%f%f%f%f%f%s%s%f%f%f%f%s%f%f%f%f%f%f%f%f%f%s%f%s%s%s%s%s%s%s%f%f%s%s%s%f%s%f%s%f%f';
fid = fopen('Building.txt','r');
data1 = textscan(fid,formatSpec,'Delimiter','|');
fclose(fid);
data1 has 47 variables, but only 36299 rows instead of 1389712 rows. I would use readtable, but it is way too slow for the large txt.files.
Please note that the formatSpec is obtained from the resulting readtable data by using summary(data) I could see the format of each variable.
This is an example of the format of the text files I am trying to use (lots of missing data I know):
EE760424-42D5-E511-80C1-3863BB43AC67|0||RESIDENTIAL STRUCTURE||RR000|||1||||||||| |0|0||0||.00||.00||C|0|||||||||||||||99748186| |38001|7017
EF760424-42D5-E511-80C1-3863BB43AC67|0||RESIDENTIAL STRUCTURE||RR000|||1||||||||| |0|0||0||.00||.00||C|0|||||||||||||||99748257| |38001|7017
Thanks a lot!
  댓글 수: 1
dpb
dpb 2020년 1월 19일
You sure there aren't missing values in the readtable table? It's much more forgiving of a bad format or missing data than is textscan
Not much think anybody can do here without a sample file to work on...it should zip up pretty compactly.

댓글을 달려면 로그인하십시오.

답변 (1개)

Jeremy Hughes
Jeremy Hughes 2020년 1월 20일
편집: Jeremy Hughes 2020년 1월 20일
If you pass in 'ReturnOnError',false with the textscan call, there will be an error message where the format cannot read your file. That's likely due to the missing data.
readtable tries to read using a detected format, and when that fails updates to re-read with a new format. It may be slow because it's reading multiple times trying to get the format correct. You could pass that same formatSpec into readtable, but it will likely error in the same way as textscan (just not silently)
If you try detectImportOptions with the file, then readtable, you might have faster/better results.
opts = detectImportOptions(file,'Delimiter','|','ExpectedNumVariables',47)
%% Check if this looks right
tp = preview(file,opts)
%% If the variable types look correct in tp, you don't need this step.
formatSpec='%s%f%s%s%s%s%f%f%f%f%f%s%s%f%f%f%f%s%f%f%f%f%f%f%f%f%f%s%f%s%s%s%s%s%s%s%f%f%s%s%s%f%s%f%s%f%f';
fmt = split(formatSpec(2:end),'%');
opts = setvartype(opts,strcmp(fmt,'f'),'double');
opts = setvartype(opts,strcmp(fmt,'s'),'char');
%% Read the whole file.
T = readtable(file,opts);
I can't really test this without your file, but it should work (maybe with some tweaking)

카테고리

Help CenterFile Exchange에서 Data Import and Export에 대해 자세히 알아보기

태그

제품


릴리스

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by