Extract regexp tokens with regexpPattern

조회 수: 5 (최근 30일)
Jan Kappen
Jan Kappen 2024년 2월 29일
댓글: Jan Kappen 2024년 2월 29일
With regexp I could extract the tokens of my capture groups via
regexp("abcd3e", "\w+(\d)+\w", "tokens")
ans = 1×1 cell array
{["3"]}
The result is a cell array. With the new regexpPattern and extract functions, the return values usually are string (arrays) which is something I prefer.
Question: Is there an analogon of the above regexp using something like extract("abcd3e", regexpPattern("\w+(\d)+\w"), "tokens")? This syntax obviously does not work in R2023b, but are there standard ways to rewrite these patterns to return my tokens?
Thanks,
Jan
EDIT: this is just a toy example, I do not only want to extract digits which could be done with digitsPattern. Ideally, I'd like to understand how directly translate the regexps.
To show a more realistic example:
str = [
"42652Z_HEX"
"42652X"
"42652Y"
"42652Z"
"42652GYRO-X_HEX"
"42652GYRO-Y_HEX"
"42652GYRO-Z_HEX"
"42351Temp_HEX"
"42652Temp_HEX"
"42652GYRO-X"
"42652GYRO-Y"
"42652GYRO-Z"
"42351Temp"
"42652Temp"
];
res = string(regexp(str, "\d+(?:GYRO-)?([XYZ])?.*", "tokens"))
res = 14×1 string array
"Z" "X" "Y" "Z" "X" "Y" "Z" "" "" "X" "Y" "Z" "" ""
% how to get the same result with matches and regexpPattern?
  댓글 수: 2
Dyuman Joshi
Dyuman Joshi 2024년 2월 29일
이동: Dyuman Joshi 2024년 2월 29일
If you just want to extract numbers between letters -
str = "abcd3e57xyz";
out = extract(str, digitsPattern)
out = 2×1 string array
"3" "57"
Jan Kappen
Jan Kappen 2024년 2월 29일
이동: Dyuman Joshi 2024년 2월 29일
Thanks for your answer.
No, I do not only want to extract numbers, it's a toy example. I'd like to translate the regexps which already exist into the new regexpPattern - if possible. The regexp might get more complicated than the shown one. I'll edit my question accordingly.

댓글을 달려면 로그인하십시오.

답변 (1개)

the cyclist
the cyclist 2024년 2월 29일
I realize that this is not really an answer to your question, but I just wanted to make sure you are aware that one option is to wrap the string function around the regexp:
string(regexp("abcd3e fghi4j", "\w+(\d)+\w", "tokens"))
ans = 1×2 string array
"3" "4"
Also, if you are guaranteed to have only one match, you could do
regexp("abcd3e", "\w+(\d)+\w", "tokens","once")
ans = "3"
but that's somewhat fragile coding, I would say.
I'm not yet sure if there is a more "direct" way with more recent functions.
  댓글 수: 2
the cyclist
the cyclist 2024년 2월 29일
Your updated question clarifies that my answer is not what you are looking for, but I'll leave it here anyway. :-)
Jan Kappen
Jan Kappen 2024년 2월 29일
Thank you very much for your answer.
Yes, I updated the question to clarify a bit, sorry.
There were cases in the past where I could not cast to string, I'll need to check why. In fact that's not a terrible solution, but I'm simply wondering how to use the new regexpPattern properly and maybe I'm missing something.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Characters and Strings에 대해 자세히 알아보기

제품


릴리스

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by