The efficient way (in terms of speed and consistency) for parsing a big text file with textscan

조회 수: 9 (최근 30일)
I have a text file consists of apprx 500 000 lines. I appended the header and some data parts in below.
#dP2021 4 3 0 0 0.00000000 288 u+U IGb14 FIT GFZ
## 2151 518400.00000000 300.00000000 59307 0.0000000000000
++ 10 10 10 6 10 6 10 8 8 8 8 8 8 8 8 8 8
++ 8 8 8 8 8 8 8 8 8 8 8 8 8 6 6 8 10
%c M cc GPS ccc cccc cccc cccc cccc ccccc ccccc ccccc ccccc
%i 0 0 0 0 0 0 0 0 0
%i 0 0 0 0 0 0 0 0 0
/* PCV:IGS14_2148 OL/AL:FES2004 NONE YN CLK:CoN ORB:CoN
/* GeoForschungsZentrum Potsdam
/*
/*
* 2021 4 3 0 0 0.00000000
PC01 -34381.586112 24435.438444 69.245923 -596.854622
PE02 4493.250988 41924.015694 -226.819605 790.650809
PG03 -14754.803607 39520.337126 -938.295010 -436.165931
PG04 -39584.473454 14533.059977 -388.137635 370.305833
.
.
.
* 2021 4 3 0 5 0.00000000
PC01 -34381.437242 24436.228124 74.813357 -596.843988
PE02 4493.541869 41922.959643 -254.934261 790.641523
PG03 -14753.360421 39519.882073 -951.586932 -436.156224
PG04 -39584.568840 14533.349312 -380.297839 370.469467
.
.
.
EOF
I need to count separately for the all PC[0-9][0-9], PE[0-9][0-9], and PG[0-9][0-9] strings in the first column of data section after the header section and date. What is the efficent way for doing this using textscan?
  댓글 수: 5
sermet OGUTCU
sermet OGUTCU 2021년 5월 15일
Dear @Jan, the format of output is not important but it will be created as string array such as;
output=["PC" "120";"PG" "200";"PE" "110"]
Sulaymon Eshkabilov
Sulaymon Eshkabilov 2021년 5월 15일
You can test: fscanf() that works in a similar way alike textscan(). Specifiers and other parameters are the same.

댓글을 달려면 로그인하십시오.

채택된 답변

Jan
Jan 2021년 5월 15일
편집: Jan 2021년 5월 15일
Str = fileread(FileName);
C = strsplit(Str, '\n');
nPC = sum(strncmp(C, 'PC', 2));
nPG = sum(strncmp(C, 'PG', 2));
nPE = sum(strncmp(C, 'PE', 2));
If the file do not match into your RAM:
fid = fopen(FileName, 'r');
nPC = 0;
nPG = 0;
nPE = 0;
while ~feof(fid)
s = fgets(fid);
if strncmp(s, 'PC', 2)
nPC = nPC + 1;
elseif strncmp(s, 'PG', 2)
nPG = nPG + 1;
elseif strncmp(s, 'PE', 2)
nPE = nPE + 1;
end
end
fclose(fid);

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Text Data Preparation에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by