필터 지우기
필터 지우기

Define regularExpression for Strsplit?

조회 수: 4 (최근 30일)
FishermanJack
FishermanJack 2017년 11월 23일
댓글: FishermanJack 2017년 11월 23일
Hi, i have a text file consisting of multiple headers with Data in between. Now, i want to use Strsplit to find the sequence of the headers and save them under a Variable.
The Problem is the all Headers are different from each other, instead of the first Word and letter.
ex.
  1. abc 0 AAA BB CC DDD 111
  2. abc 0 EEE FF GGG HH II 120
1. and 2. are not the Content of the Header. so the abc and 0 are tab delimited. after 0 there is no common sequence for the line, some of the words are tab delimited some with space, some have numbers some not.
The ending of the File can be 'just' one of These, because it has more thousands headers.
110, 005, 006, 010, 133/1A, 230, 400, NWD
The Expression i started Looks like this:
xpr = '(?m-s)^abc\s+';
but how to define the ending?
Anyone to suggest something?

채택된 답변

per isakson
per isakson 2017년 11월 23일
편집: per isakson 2017년 11월 23일
I've understood the question differently
str = fileread( ... );
[ data_blocks, headers ] = strsplit( str, '(?m-s)^abc\t0.+$', 'DelimiterType','RegularExpression' )
  댓글 수: 5
per isakson
per isakson 2017년 11월 23일
I didn't fully understood the questions and thus I made some assumptions:
  1. The entire lines shown in the question are headers. The blocks of data between the headers are not shown in the question.
  2. Every line starting with abc\t0 is a header. ( \t for tab )
  3. I still don't understand the role of the string NWD. Does it indicate the last header of the file? I ignored it.
FishermanJack
FishermanJack 2017년 11월 23일
1. Yes 2. Yes 3. No These are some random letters that are at the end of some Headers, like 110 or 005

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Walter Roberson
Walter Roberson 2017년 11월 23일
S = fileread('YourFile.txt');
regexp(S, '^abc\s.*110,\s+005,\s+006,\s+010,\s+133/1A,\s+230,\s+400,\s+NWD', 'match', 'lineanchors')
If all of the headers are the same and there is nothing between the 110* line and the next header, then consider
regexp(S, '^abc\s', 'split', 'lineanchors')
The 'abc' will be removed from each block during the splitting; it would be possible to get around that but doing so is somewhat more obscure.

카테고리

Help CenterFile Exchange에서 MATLAB에 대해 자세히 알아보기

제품

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by