Extracting specific repeating lines of text after a heading using fgetl and textscan

Question

Vincent Scalfani 2016년 7월 19일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/296250-extracting-specific-repeating-lines-of-text-after-a-heading-using-fgetl-and-textscan

댓글: Vincent Scalfani 2016년 7월 21일

Here is an example of the data I am working with. I would like to extract the line directly following each KEY tag. The files have many thousands of these, so I need to create a loop with textscan or something similar.

> <NAME>
mary
> <AGE>
30
> <KEY>
RDHQFKQIGNG
> <NAME>
john
> <AGE>
56
> <KEY>
JFJNNFNFKFNN

Desired result:

RDHQFKQIGNG
JFJNNFNFKFNN

Here is where I am at (adapted from a similar question in the past), the code does not seem to be moving the cursor, and instead works for the first one, and then grabs all data after it, instead of just the data following the KEY line.

f = fopen('data.txt', 'rt'); 
tline = fgetl(f);
while isempty(strfind(tline, '> <KEY>'))
    if tline == -1 
        break;
    end
    line = fgetl(f);
end
if tline ~= -1
    data = textscan(f,'%s','Delimiter','\r\n');
else
    disp('not found');
end
fclose(f);

Thanks!

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Stephen23 2016년 7월 19일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/296250-extracting-specific-repeating-lines-of-text-after-a-heading-using-fgetl-and-textscan#answer_229053

MATLAB Online에서 열기

temp1.txt

>> str = fileread('temp1.txt');
>> C = regexp(str,'(?<=> <KEY>\s+)\S+','match')
C = 
  'RDHQFKQIGNG'    'JFJNNFNFKFNN'

Tested on this file:

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Stephen23 2016년 7월 20일

MATLAB Online에서 열기

temp1.m

Try this:

  E = regexp(str,'^> <KEY>\s+\S+','match','lineanchors');
  E = strtrim(strrep(E,'> <KEY>',''));

And have a play with this script:

Vincent Scalfani 2016년 7월 21일

Amazing!!! PERFECT. It took 1 second to process over 4 million lines of text. Thanks so much for your time.

댓글을 달려면 로그인하십시오.

Extracting specific repeating lines of text after a heading using fgetl and textscan

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

Extracting specific repeating lines of text after a heading using fgetl and textscan

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기