How do you do regular expressions at the character level?

Question

Tom Bernand 2023년 10월 25일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2038226-how-do-you-do-regular-expressions-at-the-character-level

편집: Walter Roberson 2023년 10월 31일

Hello all,

I am trying to find words in a text with a set of rules and then extract them. I am looking for words with a certain structure. The words themselves have different lengths and letters.

for example:

term_1 = "TER";
term_2 = "ZTnE";
term_3 = "ZEnP";

...

Since I have a lot of terms, I tried to create a pattern with character-level rules. To do this, I split up the terms and always looked to see which character could occur at which position in the string.

For the simple example above:

1st place:

seg_1 = '[TZ]'

2nd place:

seg_2 = '[ET]'

3rd digit:

seg_3 = '[nR]'

4th digit:

seg_4 = '[EP]'
seg = seg_1 + seg_2 + seg_3 + seg_4;
result = extract(term_2, seg)

This now works for a term with the same length, but term_1 is not recognised.

Therefore, I have now made the following adjustment and declared the 4th seg as optionalPattern:

seg_4 = optionalPattern("E" | "P");

This is how the extraction works now. However, terms are now also extracted that skip an optionalPattern in the meantime.

Does anyone have any other ideas on how I can easily and safely include terms of different lengths?

Thank you very much!

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Walter Roberson 2023년 10월 25일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2038226-how-do-you-do-regular-expressions-at-the-character-level#answer_1340136

편집: Walter Roberson 2023년 10월 25일

MATLAB Online에서 열기

"+" on character vectors is not a pattern operation.

seg_1 = '[TZ]'
seg_1 = '[TZ]'
seg_2 = '[ET]'
seg_2 = '[ET]'
seg = seg_1 + seg_2
seg = 1×4
   182   153   174   186
char(seg)
ans = '¶□®º'
p_1 = characterListPattern('TZ')
p_1 = pattern
  Matching:

    characterListPattern("TZ")
p_2 = characterListPattern('ET')
p_2 = pattern
  Matching:

    characterListPattern("ET")
p_1 + p_2
ans = pattern
  Matching:

    characterListPattern("TZ") + characterListPattern("ET")

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

Walter Roberson 2023년 10월 29일

편집: Walter Roberson 2023년 10월 31일

MATLAB Online에서 열기

In terms of your original patterns, 'ZT E' would require that seg_3 match space instead of [nR]

To allow space instead of one of the characters, include space in the [] if you are using regexp

seg_1 = "[ TZ]"
seg_2 = "[ ET]"
seg_3 = "[ nR]"
seg_4 = "[ EP]";
seg = seg_1 + seg_2 + seg_3 + seg_4;
result = regexp(term_2, seg, 'match');

If you want to more generally include "whitespace" (such as tab) then instead of putting a space in the [], use \s such as "[\sTZ]"

Tom Bernand 2023년 10월 31일

Perfect, that worked out great.

I must have been thinking in a complicated way.

Thank you very much!

댓글을 달려면 로그인하십시오.

How do you do regular expressions at the character level?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How do you do regular expressions at the character level?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기