Reading numbers from general text-file

조회 수: 76 (최근 30일)
mrBrown
mrBrown 2011년 6월 17일
Hi all,
All morning I've been trying to figure out how to read a textfile like
This is the textfile I'd like to read.
the only interesting part are the numbers below
1 2
3 4
5 6
Would be nice to have a generic way to do this (eg read only
the lines that contain only two numbers)
I've tried using fscanf(fid,'%e %e\n') and regexp(..). But failed to get it working. Since I'm trying to read large (600mb+) datafile I don't want to fall back to processing the file line-by-line. Pre-processing the file with python or some other language is also not the preferred solution.
  댓글 수: 4
Walter Roberson
Walter Roberson 2011년 6월 17일
Are they integers, or fixed-point numbers, or exponential notation?
mrBrown
mrBrown 2011년 6월 22일
at this moment they are floating point numbers. But I'm really not looking for a specific solution for the file above. It's more an academic question of how to tackle problems like this in general.

댓글을 달려면 로그인하십시오.

답변 (5개)

Jan
Jan 2011년 6월 18일
You want to identify the lines by their contents, therefore a line-by-line processing is necessary.
But your file has 600 MB?! Then it contains up to 150.000 numbers? Then a pre-allocation is necessary for a reasonable speed. Can you define an upper limit for the number of values? I include at least a partial pre-allocation:
out = [];
len = 10000;
part = zeros(2, len);
ipart = 0;
fid = fopen(FileName, 'r');
if fid < 0, error('Cannot open file'); end
while 1 % Infinite loop
s = fgets(fid);
if ischar(s)
data = sscanf(s, '%g %g', 2);
if length(data) == 2
ipart = ipart + 1;
part(:, ipart) = data;
if ipart == len
out = cat(2, out, part);
ipart = 0;
end
end
else % End of file:
break;
end
end
out = cat(2, out, part(:, 1:ipart));

Ivan van der Kroon
Ivan van der Kroon 2011년 6월 17일
This is not a very nice solution, but it worked for me
fid=fopen('test.txt');
C = textscan(fid, '%s');
C=C{1};
a=[];
for j=1:length(C)
if length(C{j})==1
a=cat(1,a,str2num(C{j}));
end
end
fclose(fid);
  댓글 수: 1
mrBrown
mrBrown 2011년 6월 17일
It works indeed for the simple test-file that I presented.
In reality however there are also lines with more that 2 numbers (which should not be read) and numbers longer than 1

댓글을 달려면 로그인하십시오.


mrBrown
mrBrown 2011년 6월 22일
many thanks for all the replies. Seems that there are many ways to solve this problem line by line, but getting the work done with a single quick command seems to be impossible.
Finally please find below yet another line-by-line solution.
Jan: smart way of allocating memory! (I took the lazy route).
wantedlength = 5;
filename = '600mbTextfile.txt';
tic
%%This method takes 30 seconds
lines = textread(filename,'%s','delimiter','\r');
ind = length(lines);
toc
disp('processing');
%%processing takes about 200 seconds
% process data
result = zeros(ind,wantedlength); % pre-alloc
counter = 1;
% next = 1;
for iline = 1:length(lines)
line = lines{iline};
data = str2num(line);
isnumeric(data);
if isnumeric(data)
% this is the daya you want, right?
if (length(data)==wantedlength)
result(counter,:) = data;
counter = counter +1;
end
end
end
result = result(1:counter-1,:);
toc

Yella
Yella 2011년 6월 22일
If it is a txt file... u can "load" matlab function
load file.txt b=file;
where b is a matrix(Matlab has limitation on size of matrix)
clc;
clear all;
%a=fopen('ravi.txt');
%b=fread(a,inf);
%b=textread('ravi.txt', '%s', 'whitespace', '')
load ravi.txt
b=ravi;
%z=reshape(b,484,6)
c=input('Enter the value of start node:');
d=input('Enter the value of end node:');
e=input('Enter the value of column: colum 3:SY column2:SX column 1: Node column 5: SXY :::');
n=length(b(:,1));
result=[];
if (c>d)
display('Re run the program choosing c<d')
else
for i=1:1:n
if ((b(i,1)>=c) && (b(i,1)<=d))
result= [result b(i,e)]
end
end
end
display(result);
ravi is a text file with having 300 samples(all are floating point numbers) collected from ANSYS
This worked for me, might be helping u
  댓글 수: 1
Yella
Yella 2011년 6월 22일
here is the link to the program
http://www.mathworks.com/matlabcentral/fileexchange/31692-loading-text-document-in-matlab

댓글을 달려면 로그인하십시오.


Walter Roberson
Walter Roberson 2011년 6월 22일
You ruled out the short quick versions when you said that preprocessing with python or other languages was not the preferred solution.
This is the sort of thing that could be done relatively easily with a call to perl. perl can be called directly from MATLAB -- it is supplied with MATLAB and there is a specific perl() MATLAB command.
twonums.perl
while (<>) {/^\s*-?\d+\.?\d*\s+-?\d+\.?\d*\s*$/p}
MATLAB:
nums = textscan(perl('twonums.perl',InputFileName),'%f%f','CollectOutput',1);
result = nums{1};
The perl expression I give is not perfect, but it is serviceable. For example it does not allow for the possibility that the number does not have a leading digit before the decimal point. Getting all the details right for exponential format can be difficult, with little details like that making quite a difference in how easy it is to write the regular expression.
You could also use regular expressions inside MATLAB; this will be slower than calling out to perl, but might allow you to skip some of those str2num() as str2num() is fairly slow.

카테고리

Help CenterFile Exchange에서 Environment and Settings에 대해 자세히 알아보기

제품

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by