Extracting values from string using regexp

조회 수: 64 (최근 30일)
Tae Lim
Tae Lim 2020년 11월 3일
편집: Stephen23 2020년 11월 5일
Hello,
I have a string array that contains a list of equations. I would like to read these equations and use it for calculation, but I wasn't able to find a good way to do it that doesn't involve 'eval' function (for efficiency). Instead, I'd like to extract values from it using 'regexp' and re-construct the equation. But I am struggling with how to set up regexp.
Here are example equations:
k1 = "6.8e-9*Te^0.67*exp(-4.4/(Te+0.5))"
k2 = "6.8e-9*exp(-4/Te)"
k2 = "1.2e7"
These equations follow a general form of A * Te^n * exp(B/(Te+C)). I would like to extract the value of A, n, B, and C and store it in a matrix like [A, n, B, C]. So in this case, I would like to have the following as a result
k_value = [6.8e-9, 0.67, -4.4, 0.5; 6.8e-9, 0, -4, 0; 1.2e7, 0, 0, 0]
Once I have these values as a matrix, I can evaluate the original equation (given the value of Te) like
k = k_value(:,1) .* Te.^(k_value(:,2)) .* exp(k_value(:,3) / (Te + k_value(:,4)))
How can I use 'regexp' (or other method) to construct 'k_value' as above?
Thank you for your time!
  댓글 수: 1
Stephen23
Stephen23 2020년 11월 5일
Extracting just the numbers is easy and efficient using regular expressions:
str = {'6.8e-9*Te^0.67*exp(-4.4/(Te+0.5))','6.8e-9*exp(-4/Te)','1.2e7'};
rgx = '[-+]?\d+\.?\d*([eE][-+]?\d+)?';
out = regexp(str,rgx,'match');
out{:}
ans = 1x4 cell array
{'6.8e-9'} {'0.67'} {'-4.4'} {'+0.5'}
ans = 1x2 cell array
{'6.8e-9'} {'-4'}
ans = 1x1 cell array
{'1.2e7'}
The hard part is knowing which part of the expression they come from, which the accepted answer does not do.

댓글을 달려면 로그인하십시오.

채택된 답변

Cris LaPierre
Cris LaPierre 2020년 11월 4일
I'm not sure how to extract zero values for patterns that don't appear. I did want to point out that MATLAB has introduced new pattern matching capabilities in R2020b (see this blog post). This may help you, particularly if you, like me, can't make heads or tails of regular expressions.
You can find a list of the functions under the "Match Patterns" heading here.
It still takes some getting used to, but I was able to stitch together the following. Mind you, the outputs are still strings, but that's easy enough to handle (str2double). Since i couldn't think of a good way to automate filling in your coefficients matrix, I didn't attempt that.
k1 = "6.8e-9*Te^0.67*exp(-4.4/(Te+0.5))";
k2 = "6.8e-9*exp(-4/Te)";
k3 = "1.2e7";
pat = digitBoundary("start") + wildcardPattern(1,inf) + ...
lookAheadBoundary("*"|"/"|")"|textBoundary("end"));
extract(k1,pat)
ans = 4×1 string array
"6.8e-9" "0.67" "4.4" "0.5"
extract(k2,pat)
ans = 2×1 string array
"6.8e-9" "4"
extract(k3,pat)
ans = "1.2e7"
  댓글 수: 4
Tae Lim
Tae Lim 2020년 11월 5일
This is very helpful. Thank you!
Stephen23
Stephen23 2020년 11월 5일
편집: Stephen23 2020년 11월 5일
So far this does answer does not provide the final (and most important) step that Tae Lim asked for. Can someone show me how to generate the requested output matrix k_value automatically from this answer? Requested matrix:
k_value = [6.8e-9, 0.67, -4.4, 0.5; 6.8e-9, 0, -4, 0; 1.2e7, 0, 0, 0]
This answer returns strings with no indications of which part of the expression they come from, so it is unlcear to me how this can be used to automatically generate the requested output matrix k_value.
@Cris LaPierre : can you please complete the answer? I am curious how you would do this.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Stephen23
Stephen23 2020년 11월 4일
편집: Stephen23 2020년 11월 4일
The problem is not extracting the numbers (which is easy) but in knowing which of the numbers has been extracted, which is not a trivial task when different parts of the expression can be completely missing. But it can be done with regular expressions using optional grouping parentheses, which return empty strings if the content is not matched, allowing us to keep track of exactly which values have been matched:
% regular expression:
rgd = '\d+\.?\d*';
rge = '([eE][-+]?\d+)?';
rgx = ['^([-+]?NX)\*?(Te\^)?(?(2)[-+]?N)\*?(exp\()?',...
'(?(4)[-+]?N)(?(5)/\(?Te)?(?(6)[-+]N)?\)?\)?$'];
rgx = strrep(rgx,'N',rgd);
rgx = strrep(rgx,'X',rge);
% your input data:
str = {'6.8e-9*Te^0.67*exp(-4.4/(Te+0.5))','6.8e-9*exp(-4/Te)','1.2e7'};
tkn = regexp(str,rgx,'tokens','once');
tkn = vertcat(tkn{:})
tkn = 3x7 cell array
{'6.8e-9'} {'Te^' } {'0.67' } {'exp(' } {'-4.4' } {'/(Te' } {'+0.5' } {'6.8e-9'} {0×0 char} {0×0 char} {'exp(' } {'-4' } {'/Te' } {0×0 char} {'1.2e7' } {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char} {0×0 char}
format short g
mat = str2double(tkn(:,1:2:7));
mat(isnan(mat)) = 0
mat = 3×4
6.8e-09 0.67 -4.4 0.5 6.8e-09 0 -4 0 1.2e+07 0 0 0
Note that this regular expression does not check for syntactic correctness, it can match other strings which are not syntactically correct expressions, i.e. it relies on your a priori knowledge about the input strings. And I had to make some guesses about the permitted syntaxes, which so far you have not formally defined.
  댓글 수: 1
Tae Lim
Tae Lim 2020년 11월 5일
Hi Stephen, thank you so much for your reply! This works perfectly but I chose Chris' answer because I am not familiar with regexp enough to modify your example and apply it to another form of equation. But I appreciate your response!

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Get Started with MATLAB에 대해 자세히 알아보기

태그

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by