How to make textscan robust against non-matching lines?

Question

Joan Vazquez 2021년 4월 8일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/796102-how-to-make-textscan-robust-against-non-matching-lines

댓글: Stephen23 2021년 4월 9일

data.txt

I have files with lines that I want to parse, preferably with textscan. In between those lines, there may be lines to be skipped (unpredictable format and abundance, but definetely new lines). What is the best way to deal with it?

E.g. for the data in attachment, this will stop outputiing #HELLOMATHWORKS messages after line 4.

fid = fopen('data.txt');
out = textscan(fid,'#HELLOMATHWORKS,%[^,],%n');
fclose(fid);

This is a MWE out of a large code base.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Stephen23 2021년 4월 8일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/796102-how-to-make-textscan-robust-against-non-matching-lines#answer_670412

편집: Stephen23 2021년 4월 9일

MATLAB Online에서 열기

data.txt

str = fileread('data.txt');
tkn = regexp(str,'#HELLOMATHWORKS,([^,]+),(\S+)','tokens');
tkn = vertcat(tkn{:})
tkn = 6×2 cell array
    {'COM1'}    {'2146'}
    {'COM1'}    {'2147'}
    {'COM1'}    {'2148'}
    {'COM1'}    {'2149'}
    {'COM1'}    {'2150'}
    {'COM1'}    {'2151'}
vec = str2double(tkn(:,2))
vec = 6×1
        2146
        2147
        2148
        2149
        2150
        2151

댓글 수: 2
없음 표시없음 숨기기

Joan Vazquez 2021년 4월 8일

편집: Joan Vazquez 2021년 4월 8일

This does not produce the same output as my code:

tmp =

1×2 cell array

{6×1 cell} {6×1 double}

(Actually my messages have many more fields, this was just a MWE with 2... I have many similar functions using texscan to parse messages and I wanted to avoid refactoring them)

It is a good idea to work directly with regular expressions, but it seems that the formatSpec input parameter of textscan is not just any regular expression, it is more limited...

Anyway, It's OK for the moment, I'll accept the answer, thanks

Stephen23 2021년 4월 9일

@Joan Vazquez: I presume that the text #HELLOMATHWORKS is not what is actually in your file. If the actual text contains some unique character that does not exist anywhere else in the file, you might be able to leverage the LineEnding/EndOfLine option to achieve the goal of reading the file data using textscan.

댓글을 달려면 로그인하십시오.

Answer 2

Joan Vazquez 2021년 4월 8일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/796102-how-to-make-textscan-robust-against-non-matching-lines#answer_670177

MATLAB Online에서 열기

This works, but it does not seem the best solution...Ideally, I would tell textscan "skip everything until a new line starts with #HELLOMATHWORKS"

filetext = fileread('data.txt');
expr = '[^\n]*#HELLOMATHWORKS[^\n]*';
% Find and return all lines that contain the text '#HELLOMATHWORKS'.
matches = regexp(filetext,expr,'match');
% Make it a 1xN char to feed textscan
goodlines = sprintf('%s\n', matches{:});
tmp = textscan(goodlines,'#HELLOMATHWORKS,%[^,],%n');

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

How to make textscan robust against non-matching lines?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How to make textscan robust against non-matching lines?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기