The efficient way (in terms of speed and consistency) for parsing a big text file with textscan

Question

sermet OGUTCU 2021년 5월 15일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/830813-the-efficient-way-in-terms-of-speed-and-consistency-for-parsing-a-big-text-file-with-textscan

댓글: sermet OGUTCU 2021년 5월 15일

I have a text file consists of apprx 500 000 lines. I appended the header and some data parts in below.

#dP2021  4  3  0  0  0.00000000     288   u+U IGb14 FIT  GFZ                    
## 2151 518400.00000000   300.00000000 59307 0.0000000000000                                
++        10 10 10  6 10  6 10  8  8  8  8  8  8  8  8  8  8                    
++         8  8  8  8  8  8  8  8  8  8  8  8  8  6  6  8 10                                   
%c M  cc GPS ccc cccc cccc cccc cccc ccccc ccccc ccccc ccccc                                
%i    0    0    0    0      0      0      0      0         0                    
%i    0    0    0    0      0      0      0      0         0                    
/* PCV:IGS14_2148 OL/AL:FES2004  NONE     YN CLK:CoN ORB:CoN                    
/*     GeoForschungsZentrum Potsdam                                             
/*                                                                              
/*                                                                              
*  2021  4  3  0  0  0.00000000                                                 
PC01 -34381.586112  24435.438444     69.245923   -596.854622                    
PE02   4493.250988  41924.015694   -226.819605    790.650809                    
PG03 -14754.803607  39520.337126   -938.295010   -436.165931                    
PG04 -39584.473454  14533.059977   -388.137635    370.305833                    
.
.
.
*  2021  4  3  0  5  0.00000000
PC01 -34381.437242  24436.228124     74.813357   -596.843988                    
PE02   4493.541869  41922.959643   -254.934261    790.641523                    
PG03 -14753.360421  39519.882073   -951.586932   -436.156224                    
PG04 -39584.568840  14533.349312   -380.297839    370.469467                    
.
.
.
EOF

I need to count separately for the all PC[0-9][0-9], PE[0-9][0-9], and PG[0-9][0-9] strings in the first column of data section after the header section and date. What is the efficent way for doing this using textscan?

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

sermet OGUTCU 2021년 5월 15일

MATLAB Online에서 열기

Dear @Jan, the format of output is not important but it will be created as string array such as;

output=["PC" "120";"PG" "200";"PE" "110"]

Sulaymon Eshkabilov 2021년 5월 15일

You can test: fscanf() that works in a similar way alike textscan(). Specifiers and other parameters are the same.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Jan 2021년 5월 15일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/830813-the-efficient-way-in-terms-of-speed-and-consistency-for-parsing-a-big-text-file-with-textscan#answer_700468

편집: Jan 2021년 5월 15일

MATLAB Online에서 열기

Str = fileread(FileName);
C   = strsplit(Str, '\n');
nPC = sum(strncmp(C, 'PC', 2));
nPG = sum(strncmp(C, 'PG', 2));
nPE = sum(strncmp(C, 'PE', 2));

If the file do not match into your RAM:

fid = fopen(FileName, 'r');
nPC = 0;
nPG = 0;
nPE = 0;
while ~feof(fid)
   s = fgets(fid);
   if strncmp(s, 'PC', 2)
       nPC = nPC + 1;
   elseif strncmp(s, 'PG', 2)
       nPG = nPG + 1;
   elseif strncmp(s, 'PE', 2)    
       nPE = nPE + 1;
   end
end
fclose(fid);