How to identify if a string has a pattern ?

조회 수: 5 (최근 30일)
mary
mary 2022년 11월 3일
댓글: Walter Roberson 2022년 11월 9일
Hi,
I have a set of strings and I want to categorize them into the different set of functions.
Let's consider the following set of strings:
setStrings{1} = '2*x + 3';
setStrings{2} = '2*exp(-3/x)'
setStrings{3} = 'sin(x)'
and the following set of functions:
func1 = @(x) A*x+B
func2 = @(x) A*exp(-B/x)
How we can check that setStrings{1} and setStrings{2} have, respectively, the form of func1 and func2? And how to find out that setStrings{3} does not follow the form of availabla functions?
And finally, is the a way to simplify a string that has several multiplications inside? for instance:
'10*exp(-3/x)*1/5' -> '2*exp(-3/x)'
Thanks in advance for your help.

채택된 답변

Walter Roberson
Walter Roberson 2022년 11월 4일
setStrings{1} = '2*x + 3';
setStrings{2} = '2*exp(-3/x)';
setStrings{3} = '-2*x - 3';
setStrings{4} = '-2*exp(3/x)';
setStrings{5} = 'sin(x)'
setStrings = 1×5 cell array
{'2*x + 3'} {'2*exp(-3/x)'} {'-2*x - 3'} {'-2*exp(3/x)'} {'sin(x)'}
first_kind = regexp(setStrings, '(?<A>-?\d+)\s*\*\s*x\s*(?<B>[+-]\s*\d+)', 'names')
first_kind = 1×5 cell array
{1×1 struct} {0×0 struct} {1×1 struct} {0×0 struct} {0×0 struct}
first_results = arrayfun(@(S) structfun(@str2num, S, 'uniform', 0), vertcat(first_kind{:}))
first_results = 2×1 struct array with fields:
A B
second_kind = regexp(setStrings, '(?<A>-?\s*\d+)\s*\*\s*exp\((?<negB>[+-]?\s*\d+)\s*/\s*x\s*\)', 'names')
second_kind = 1×5 cell array
{0×0 struct} {1×1 struct} {0×0 struct} {1×1 struct} {0×0 struct}
second_results = arrayfun(@(S) structfun(@str2double, S, 'uniform', 0), vertcat(second_kind{:}))
second_results = 2×1 struct array with fields:
A negB
and you would probably want to create B = -negB .
That is, the way you expressed it, for the -3/x you would be interested in getting a positive 3 out as the value. But I coded for the possibility that the constant there is positive, such as exp(4/x) and in your terms, that would have to be parsed as exp(-(-4)/x) -- that is, as a case of exp(-B/x) with B being -4
The expressions get notably messier if you allow floating point numbers. Part of the mess is having to handle all of the possibilities such as 4 vs 4. vs 4.2 vs 0.2 vs .2 vs 4e-3 vs 4e+3 vs 4e3 vs 4e+3 vs .4e3 ... but disallowing just plain . and .+3 . The expressions get less messy if you say that there will always be at least one decimal digit before the period if there is a period at all.
  댓글 수: 4
mary
mary 2022년 11월 8일
Walter, for handling all of the possibilities such as 4 vs 4. vs 4.2 vs 0.2 vs .2 vs 4e-3 vs 4e+3 vs 4e3 vs 4e+3 vs .4e3 that you mentioned above, I use str2sym. However, there is one thing that I cannot understand with optional plus or minus. Sometimes, it does not work. For instance, in the example below, it sees for setStrings_New{2}, B = 0.5 but for setStrings_New{1}, B is not filled. Do you know why?
setStrings{1} = '0.166*x^-0.5*exp(846.339/x)';
setStrings{2} = '0.166*x^0.5*exp(846.339/x)';
setSyms = str2sym(setStrings);
setStrings_New = string(setSyms);
expression = '(?<A>-?\s*\d+\.?\d*)\s*(\*\s*x\s*\^\s*)?(?<B>[+-]?\d+\.?\d*)?\s*\*\s*exp\((?<C>[+-]?\s*\d+\.?\d*)\s*/\s*x\s*\)';
second_kind = regexp(setStrings_New, expression, 'names');
Moreover, I can not make the last part of the expression optional (if the expression is a constant number, it is considered as a second kind) by doing:
expression = '(?<A>-?\s*\d+\.?\d*)\s*(\*\s*x\s*\^\s*)?(?<B>[+-]?\d+\.?\d*)?\s*(\*\s*exp\((?<C>[+-]?\s*\d+\.?\d*)\s*/\s*x\s*\))?';
Walter Roberson
Walter Roberson 2022년 11월 9일
Each time you have a * in the pattern (that is not \* ) then that means "go as far as possible matching this same pattern.
If there is any pattern after that in the regular expression, then regexp() looks to see whether it can match that second pattern at the point where the first pattern stopped matching, and if so then processing continues, with that pattern matching as far as it goes if it has a * in it. This sequence of going as far as possible with each * continues (as is also the case if you use a {} count modifier that specifies an indefinite count.)
Eventually you may find that you are at the end of the input string and you are also at a place in the input pattern list that is happen to have no further input... and in this case everyone is happy and everything is emitted in the same order it was found.
But when you have a * modifier and you have more than one pattern in the regular expression, it is common to find that you have reached a point in the input that does not match the next pattern in the expression. When that happens, the matcher goes backwards by one optional repetition, and checks to see whether you can go forward in the matches from there. And if that does not work, it goes backwards by more repetitions, and tries again. This process and end up unwinding long sequences of patterns and end up using completely different branches of patterns than were used before. But the unwinding does not unwind as far as possible: the unwinding stops at the first point where you can successfully match the entire rest of the patterns.
When you have optional portions of the pattern, this can end up in the situation where "obviously" there is a match for specific sub-expressions, but there was some optional component to it, and the unwinding process was able to satisfy the remaining expression by supposing that the intermediate component was empty.
Your expression has
(\*\s*x\s*\^\s*)?
which allows the *x^ component to be optional. But it also has
(?<B>[+-]?\d+\.?\d*)?
which allows the B component to be optional. One of the valid possibilities is that they are both absent. When you "back up" to that point, you are not necessarily going to match where you think you should.
This is why you might sometimes see expressions coded as (A|B|AB) instead of as A?B? -- when you see a seemingly redundent list of possibilites, then it might have been to eliminate the possibility that none of the subsections match at all.
I have not chased through to find the exact failure that allows A and C to be matched but not B, but it looks to me that you have not been strict enough with insisting that at least one of the possibilities has been matched.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Matt J
Matt J 2022년 11월 4일
편집: Matt J 2022년 11월 4일
And finally, is the a way to simplify a string that has several multiplications inside?
If you have the Symbolic Math Toolbox
syms x
simplify(10*exp(-3/x)*1/5)
ans = 
How we can check that setStrings{1} and setStrings{2} have, respectively, the form of func1 and func2? And how to find out that setStrings{3} does not follow the form of availabla functions?
You could do it by curve fitting each of the setStrings to each of your models and assessing the fitting error. Depending on the variety possible in setStrings you might be able to do things like the following,
setStrings{1} = '2*x+3';
setStrings{2} = '2*exp(-3/x)';
setStrings{3} = 'sin(x)';
N=numel(setStrings);
spl0={'A','B',''};
for i=1:N
spl=split(setStrings{i},digitsPattern)';
if numel(spl)~=3,
setStrings{i}='Unclassified';
else
setStrings{i} = strjoin([spl;spl0],'');
end
end
setStrings
setStrings = 1×3 cell array
{'A*x+B'} {'A*exp(-B/x)'} {'Unclassified'}
[tf,loc]=ismember(setStrings, {'A*x+B';'A*exp(-B/x)'})
tf = 1×3 logical array
1 1 0
loc = 1×3
1 2 0
  댓글 수: 1
mary
mary 2022년 11월 4일
Thank you. Your solution works perfectly.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Function Creation에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by