필터 지우기
필터 지우기

using regexp for space delimited strings in text file.

조회 수: 4 (최근 30일)
sermet
sermet 2016년 1월 26일
편집: Stephen23 2016년 1월 26일
I need to extract repeated strings' lines from the attached text file. For example there are 2 lines which start with "P 1" (two spaces after P) string in the data file. I need to extract 2nd and 4th column of these lines as follows;
array_P1=[ 6444.951599 -24080.372159 -8934.980576; 6645.371003 -22892.293251 -11497.619680];
I use following codes (from Stephen Cobeldick) if there are no space in repeated strings (for example P1);
fid = fopen('data_file.txt','rt');
str = fscanf(fid,'%c',Inf);
fclose(fid);
C = regexp(str,'^P1( +\S+)+\s+$','lineanchors','tokens');
C = regexp(vertcat(C{:}),'\S+','match');
N = str2double(vertcat(C{:}));
But this doesn't work if there are spaces in the repeated strings as in my example (P 1)

채택된 답변

Stephen23
Stephen23 2016년 1월 26일
편집: Stephen23 2016년 1월 26일
Try this:
% textscan options:
opt = {'MultipleDelimsAsOne',true,'CollectOutput',true};
% required arrays:
str = 'X';
dtv = [];
dat = {};
% open textfile:
fid = fopen('data.txt','rt');
while ischar(str)
% skip lines until first char is '*' (date vector):
while ~strcmp(str(1),'*')
str = fgetl(fid);
end
% convert date vector to numeric:
dtv(end+1,:) = str2double(regexp(str(2:end),'\S+','match')); %#ok<SAGROW>
% get file position:
pos = ftell(fid);
% read first line of matrix:
str = fgetl(fid);
if ischar(str)
% calculate how many columns in the matrix:
N = numel(regexp(str(5:end),'\S+','match'));
fmt = repmat('%f',1,N);
% rewind one line:
fseek(fid,pos,'bof');
% read entire matrix:
dat{end+1} = textscan(fid,['%4[^*]',fmt],opt{:}); %#ok<SAGROW>
end
end
% concatenate data in cell arrays:
dat = vertcat(dat{:});
mat = vertcat(dat{:,2});
This reads the entire data matrix (between the date vectors) into a numeric matrix inside the cell array dat, and the date vectors in dtv. It automatically adjusts for the different numbers of columns in your matrices. Some important assumptions:
  • the first columns comprise of exactly four characters (which may be spaces).
  • the date vectors always start with asterisks, but no other lines do.
  • no empty lines between the date vectors and the data matrices.
  • the matrices contain numeric data only.
Have a look inside dat, and pick the data that you need:
>> cell2mat(cellfun(@(m)m(1,[1,2,3]),dat(:,2),'UniformOutput',false))
ans =
1.0e+04 *
0.6445 -2.4080 -0.8935
0.6645 -2.2892 -1.1498
I also concatenated the matrices into mat, which lets gives you all of the matrices in one. This might be easier to access:
>> mat([1,10],[1,2,3])
ans =
1.0e+04 *
0.6445 -2.4080 -0.8935
0.6645 -2.2892 -1.1498
I tested this code on both of the files that you have provided (this question, and your last question), which are also available here:
  댓글 수: 1
sermet
sermet 2016년 1월 26일
thank you Stephen, I always really appreciate your help.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Guillaume
Guillaume 2016년 1월 26일
Well, I guess it's time for you to learn the regular expression language.
This regex should work for you:
'^P\s*1( +\S+)+\s+$'
It simply adds 0 or more (the *) whitespace characters (the \s) between P and 1.
  댓글 수: 1
sermet
sermet 2016년 1월 26일
I modified the code as you explained;
C = regexp(str,'^P\s*1( +\S+)+\s+$','lineanchors','tokens')
But it produced empty C
{}

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Text Files에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by