How to use textscan to read my 2nd column and ignore the string or non numerical values?

조회 수: 52 (최근 30일)
Hello, I have a huge data file and I was wondering if anyone could help me use textscan to only read the 2nd column but also ignore the strings. The file has this sort of format and the data keeps going.
detector located at x,y,z = 3.97500E+03 3.97500E+03-9.95000E+01
energy
1.0000E-01 5.44426E-10 0.4254
1.2475E-01 1.71665E-10 0.8055
1.4950E-01 8.51003E-11 0.8817
1.7426E-01 2.09602E-10 0.6570
1.9901E-01 2.62823E-10 0.4473
2.2376E-01 3.18821E-11 0.7145
2.4851E-01 4.37107E-11 0.6539
2.7327E-01 1.34258E-10 0.6703
2.9802E-01 1.31857E-10 0.6663
3.2277E-01 4.53330E-11 0.9459
3.4752E-01 9.37144E-13 0.9914
3.7228E-01 2.99698E-10 0.9950
3.9703E-01 7.03990E-18 0.9950
4.2178E-01 5.24669E-16 0.9950
4.4653E-01 4.01338E-29 0.9950
4.7129E-01 0.00000E+00 0.0000
4.9604E-01 0.00000E+00 0.0000
5.2079E-01 0.00000E+00 0.0000
5.4554E-01 0.00000E+00 0.0000
5.7030E-01 0.00000E+00 0.0000
5.9505E-01 0.00000E+00 0.0000
6.1980E-01 9.12419E-29 0.9950
6.4455E-01 0.00000E+00 0.0000
6.6931E-01 2.43906E-25 0.9950
6.9406E-01 0.00000E+00 0.0000
7.1881E-01 0.00000E+00 0.0000
7.4356E-01 0.00000E+00 0.0000
7.6832E-01 1.86537E-12 0.9950
7.9307E-01 0.00000E+00 0.0000
8.1782E-01 2.11385E-11 0.9950
8.4257E-01 0.00000E+00 0.0000
8.6733E-01 0.00000E+00 0.0000
8.9208E-01 0.00000E+00 0.0000
9.1683E-01 0.00000E+00 0.0000
9.4158E-01 9.73682E-13 0.9950
9.6634E-01 0.00000E+00 0.0000
9.9109E-01 0.00000E+00 0.0000
1.0158E+00 0.00000E+00 0.0000
1.0406E+00 0.00000E+00 0.0000
1.0653E+00 4.94059E-40 0.9950
1.0901E+00 0.00000E+00 0.0000
1.1149E+00 0.00000E+00 0.0000
1.1396E+00 0.00000E+00 0.0000
1.1644E+00 0.00000E+00 0.0000
1.1891E+00 0.00000E+00 0.0000
1.2139E+00 0.00000E+00 0.0000
1.2386E+00 0.00000E+00 0.0000
1.2634E+00 0.00000E+00 0.0000
1.2881E+00 0.00000E+00 0.0000
1.3129E+00 0.00000E+00 0.0000
1.3376E+00 6.03842E-10 0.9950
1.3624E+00 0.00000E+00 0.0000
1.3871E+00 0.00000E+00 0.0000
1.4119E+00 0.00000E+00 0.0000
1.4366E+00 0.00000E+00 0.0000
1.4614E+00 0.00000E+00 0.0000
1.4861E+00 0.00000E+00 0.0000
1.5109E+00 0.00000E+00 0.0000
1.5356E+00 0.00000E+00 0.0000
1.5604E+00 0.00000E+00 0.0000
1.5851E+00 0.00000E+00 0.0000
1.6099E+00 0.00000E+00 0.0000
1.6347E+00 0.00000E+00 0.0000
1.6594E+00 0.00000E+00 0.0000
1.6842E+00 0.00000E+00 0.0000
1.7089E+00 0.00000E+00 0.0000
1.7337E+00 0.00000E+00 0.0000
1.7584E+00 0.00000E+00 0.0000
1.7832E+00 0.00000E+00 0.0000
1.8079E+00 0.00000E+00 0.0000
1.8327E+00 0.00000E+00 0.0000
1.8574E+00 0.00000E+00 0.0000
1.8822E+00 0.00000E+00 0.0000
1.9069E+00 0.00000E+00 0.0000
1.9317E+00 0.00000E+00 0.0000
1.9564E+00 0.00000E+00 0.0000
1.9812E+00 0.00000E+00 0.0000
2.0059E+00 0.00000E+00 0.0000
2.0307E+00 0.00000E+00 0.0000
2.0554E+00 0.00000E+00 0.0000
2.0802E+00 0.00000E+00 0.0000
2.1050E+00 0.00000E+00 0.0000
2.1297E+00 0.00000E+00 0.0000
2.1545E+00 0.00000E+00 0.0000
2.1792E+00 0.00000E+00 0.0000
2.2040E+00 0.00000E+00 0.0000
2.2287E+00 0.00000E+00 0.0000
2.2535E+00 0.00000E+00 0.0000
2.2782E+00 0.00000E+00 0.0000
2.3030E+00 0.00000E+00 0.0000
2.3277E+00 0.00000E+00 0.0000
2.3525E+00 0.00000E+00 0.0000
2.3772E+00 0.00000E+00 0.0000
2.4020E+00 0.00000E+00 0.0000
2.4267E+00 0.00000E+00 0.0000
2.4515E+00 0.00000E+00 0.0000
2.4762E+00 0.00000E+00 0.0000
2.5010E+00 0.00000E+00 0.0000
2.5257E+00 0.00000E+00 0.0000
2.5505E+00 0.00000E+00 0.0000
2.5752E+00 0.00000E+00 0.0000
2.6000E+00 0.00000E+00 0.0000
total 2.58911E-09 0.3011
detector located at x,y,z = 3.97500E+03 3.97500E+03-9.95000E+01
uncollided photon flux
energy
1.0000E-01 7.06645E-15 0.9950
1.2475E-01 0.00000E+00 0.0000
1.4950E-01 0.00000E+00 0.0000
  댓글 수: 4
John Vargas
John Vargas 2018년 8월 22일
I am sorry, in this file, I had already removed the first two lines of strings which are:
detector located at x,y,z = 3.97500E+03 3.97500E+03-9.95000E+01 energy

댓글을 달려면 로그인하십시오.

채택된 답변

Star Strider
Star Strider 2018년 8월 22일
Your file is not easy to import. I’ve been working on this for a while.
Try this:
fidi = fopen('M1output1.txt','rt');
k1 = 1;
while ~feof(fidi)
C = textscan(fidi, '%*f%f%*f', 'HeaderLines',2, 'CollectOutput',true, 'CommentStyle',{' total', ' energy'});
M = cell2mat(C);
if isempty(M)
break
end
D{k1,:} = M;
fseek(fidi, 0, 0);
k1 = k1 + 1
end
fclose(fidi);
Column2 = cell2mat(D);
The textscan format descriptor reads only Column 2, ignoring the other two columns.
The loop is necessary because you have several text lines that interrupt the ordinary file reading process, so every time textscan encounters text where it expects a numeric value, it stops, snd it is necessary to use fseek to re-start it and read the next block of numbers. Your file also does not have a valid ‘end-of-file’ indicator, so to keep the loop from becoming infinite, it is necessary to test to see if the input is empty. If it is, then the loop breaks and file reading stops. Since this does not occur until all the data have been read, no data are lost.
The cell2mat call at the end concatenates the cells in ‘D’ into a single vector.
  댓글 수: 3
Star Strider
Star Strider 2018년 8월 22일
As always, my pleasure.
Yes. The asterisk between the ‘%’ and the type descriptor (here ‘f’) tells textscan to ignore that column. So to read all columns:
'%f%f%f'
to read only the first column and ignore the last two:
'%f%*f%*f'
or in a more readable (and still valid) form:
'%f %*f %*f'
and so for any other combinations you want to import or exclude.
Walter Roberson
Walter Roberson 2018년 8월 22일
Star Strider used a format of
'%*f%f%*f'
that says to skip the first number, read and record the second, and skip the third number.
So use %*f for any column you want to skip, and %f for any column you want to read in.

댓글을 달려면 로그인하십시오.

추가 답변 (4개)

Jeremy Hughes
Jeremy Hughes 2018년 8월 22일
편집: Jeremy Hughes 2018년 8월 22일
I suggest trying
opts = detectImportOptions(filename)
T = readtable(filename,opts)
Also, if you want to ignore those lines:
opts = detectImportOptions(filename)
opts.ImportErrorRule = 'omitrow'
T = readtable(filename,opts)

Yuvaraj Venkataswamy
Yuvaraj Venkataswamy 2018년 8월 22일

Walter Roberson
Walter Roberson 2018년 8월 22일
You can use the CommentStyle option of textscan, specifying {'detector', 'energy'} . This will ignore the x, y, z coordinates on those lines.

jonas
jonas 2018년 8월 22일
Here's another approach with regexprep and textscan
%%Read and remove annoying intermediate headers
str=fileread('File.txt');
str=regexprep(str,'(total|detector).*?energy','');
%%Read 2nd col
num=textscan(str,'%*f%f%*f');
out=cell2mat(num)

카테고리

Help CenterFile Exchange에서 Text Files에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by