MATLAB Answers

Extracting specific repeating lines of text after a heading using fgetl and textscan

조회 수: 5(최근 30일)
Here is an example of the data I am working with. I would like to extract the line directly following each KEY tag. The files have many thousands of these, so I need to create a loop with textscan or something similar.
> <NAME>
mary
> <AGE>
30
> <KEY>
RDHQFKQIGNG
> <NAME>
john
> <AGE>
56
> <KEY>
JFJNNFNFKFNN
Desired result:
RDHQFKQIGNG
JFJNNFNFKFNN
Here is where I am at (adapted from a similar question in the past), the code does not seem to be moving the cursor, and instead works for the first one, and then grabs all data after it, instead of just the data following the KEY line.
f = fopen('data.txt', 'rt');
tline = fgetl(f);
while isempty(strfind(tline, '> <KEY>'))
if tline == -1
break;
end
line = fgetl(f);
end
if tline ~= -1
data = textscan(f,'%s','Delimiter','\r\n');
else
disp('not found');
end
fclose(f);
Thanks!

  댓글 수: 0

댓글을 달려면 로그인하십시오.

채택된 답변

Stephen Cobeldick
Stephen Cobeldick 19 Jul 2016
>> str = fileread('temp1.txt');
>> C = regexp(str,'(?<=> <KEY>\s+)\S+','match')
C =
'RDHQFKQIGNG' 'JFJNNFNFKFNN'
Tested on this file:

  댓글 수: 3

Vincent Scalfani
Vincent Scalfani 20 Jul 2016
Thanks so much, it seems to work great with small text files, but in a file with about 20,000 of these tags, it is very slow. It is currently 4 hours in and still running. Any ideas how to adjust the code to make it more efficient?
Stephen Cobeldick
Stephen Cobeldick 20 Jul 2016
Try this:
E = regexp(str,'^> <KEY>\s+\S+','match','lineanchors');
E = strtrim(strrep(E,'> <KEY>',''));
And have a play with this script:
Vincent Scalfani
Vincent Scalfani 21 Jul 2016
Amazing!!! PERFECT. It took 1 second to process over 4 million lines of text. Thanks so much for your time.

댓글을 달려면 로그인하십시오.

추가 답변(0개)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by