problems with a regex

조회 수: 4 (최근 30일)
Thomas
Thomas 2013년 7월 9일
Hi.
I'm trying to create a regular expression to match and extract some information. Two examples of the source string
example one: 10/0/leaf.nr.0 is a Projection error - touches edge - 3D points.csv
example two: 10/2/leaf.nr.2 is a Projection error - 3D points.csv
I want to extract the string between "is a " and " - touches edge" OR " - 3D" In both example strings this would be "Projection error" but this can be something else.
Currently I have the pattern:
'.*is\sa\s(?<type>.*)(?:\s\-\stouches\sedge)?(?:\s\-\s3D).*.csv'
for example one this returns (not expected):
'Projection error - touches edge'
but for example two it returns(expected):
'Projection error'
IF I change the pattern to:
'.*is\sa\s(?<type>.*)(?:\s\-\stouches\sedge)(?:\s\-\s3D).*.csv'
so I require the (?:\s\-\stouches\sedge) to be matched it returns (correctly):
'Projection error'
for example one but now example two (that dont have the the "touches edge" part ) will not match(of cause).
I dont get why example one also contains the " - touches edge" in the result using the first pattern when I ask it to match this pattern 0 or 1 times.
Any help will be highly appreciated.
Best regards, Thomas
  댓글 수: 1
Thomas
Thomas 2013년 7월 9일
My current solution is to use this pattern instead:
'.*is\sa\s(?<type>[\w\s]*)(?:\s\-\s)?.*'
It results in the needed information except an extra space character are added. So the result for both example one and two are now:
"Projection error "

댓글을 달려면 로그인하십시오.

답변 (2개)

Muthu Annamalai
Muthu Annamalai 2013년 7월 9일
A simple solution to parse the string with rule
"is a " and ( " - touches edge" OR " - 3D" )
is to use sequential regexp().
That way you know "is a" bit of your source is split out, and then you can search for which of 2 alternatives are present in your case.
Also see the 'NOT' exclusion class operators in regexp, and 'split' mode of regexp.
http://www.mathworks.com/help/matlab/ref/regexp.html
  댓글 수: 1
Thomas
Thomas 2013년 7월 9일
Thanks for your response.
My task is not to match either of the two cases - its simply to extract the string between "is a " and the first " - " (This is a new, shorter, formulation of my problem that I just realized)
Splitting would be a way to go but I would like to know if its possible to create a regex for it.

댓글을 달려면 로그인하십시오.


per isakson
per isakson 2013년 7월 9일
편집: per isakson 2013년 7월 9일
to extract the string between "is a " and the first " - " This formulation is close to a pseudo-code for the expression we search.
ex1 = '10/0/leaf.nr.0 is a Projection error - touches edge - 3D points.csv';
ex2 = '10/2/leaf.nr.2 is a Projection error - 3D points.csv';
regexp( ex1, '(?<=is a )[^\-]+(?= \- )', 'match' )
regexp( ex2, '(?<=is a )[^\-]+(?= \- )', 'match' )
returns
ans =
'Projection error'
ans =
'Projection error'
Search the doc for "Lookaround Assertions" or just "Lookaround". Lookahead Assertions in Regular Expressions
PS. '\-' or just '-' ; a backslash (escape) too many seldom hurts and I've problems to remember when it's needed.
.
OR according to the requirement of the OP
regexp( ex1, '(?<=is a ).+?(?= ((\- touches edge)|(\- 3D)))', 'match' )
regexp( ex2, '(?<=is a ).+?(?= ((\- touches edge)|(\- 3D)))', 'match' )
The extra parentheses, (), makes the expression more readable - imo.
The "?" in ".+?" is the
Lazy expression: match as few characters as necessary.

카테고리

Help CenterFile Exchange에서 Characters and Strings에 대해 자세히 알아보기

태그

제품

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by