Counting occurance of exact word from string array

조회 수: 2 (최근 30일)
Tae Lim
Tae Lim 2020년 10월 4일
편집: Stephen23 2020년 10월 4일
Hi,
I am having trouble counting how many times a certain word appears in a string. I need to count the 'exact match' only, ignoring substrings. I tried 'count' and some other options but couldn't figure this out. The strings are chemical equations and the words are chemical species. Here's an example:
R1 = ["Ar* + Ar* "," Ar+ + Ar + e "]; % chemical reaction (string array; col1=reactants, col2=products)
S = {'Ar';'Ar*';'Ar+';'e'}; % species (cell array)
I want to get two variables as a result (below).
NumR = [0, 2, 0, 0]; % Number of occurance of each species as reactants
NumP = [1, 0, 1, 1]; % Number of occurance of each species as products
% each column represent number of occurance of Ar, Ar*, Ar+, e respectively
I tried 'count' function but this gives me a wrong result. For instance, if I try to count the number of occurance of 'Ar', Matlab also counts 'Ar+' and 'Ar*' as 'Ar' even though they are different species. How do I ignore substrings (i.e. 'Ar+' and 'Ar*' in this case)?
Thank you!

채택된 답변

Stephen23
Stephen23 2020년 10월 4일
>> C = {'Ar* + Ar* ',' Ar+ + Ar + e '};
>> S = {'Ar';'Ar*';'Ar+';'e'};
>> rgx = strcat('(?<=\s|^)',regexptranslate('escape',S),'(?=\s|$)');
>> NumR = cellfun(@numel,regexp(C{1},rgx))
NumR =
0
2
0
0
>> NumP = cellfun(@numel,regexp(C{2},rgx))
NumP =
1
0
1
1
  댓글 수: 2
Tae Lim
Tae Lim 2020년 10월 4일
Thank you for your reponse. This works great! Thank you. Could you explain further how 'rgx' works? I do not follow the logic behind it.
Stephen23
Stephen23 2020년 10월 4일
편집: Stephen23 2020년 10월 4일
"Could you explain further how 'rgx' works?"
rgx is a cell array of regular expressions, one for each corresponding cell in S.
A simple break-down of the regular expression, where XXX is the escaped string from S:
'(?<=\s|^)XXX(?=\s|$)'
'(?<=\s|^) ' % lookbehind: must be space or start of string
' XXX ' % string from S (special characters are escaped first)
' (?=\s|$)' % lookahead: must be space or end of string
These regular expressions are used by regexp to search the strings in C. When a substring matches this regular expression, regexp returns the indices. Then cellfun is used to count those indices (i.e. how many matches).

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

KSSV
KSSV 2020년 10월 4일
R1 = ["Ar* + Ar* "," Ar+ + Ar + e "]; % chemical reaction (string array; col1=reactants, col2=products)
s = "Ar" ;
n = nnz(strfind(R1,s))
  댓글 수: 1
Tae Lim
Tae Lim 2020년 10월 4일
Thank you for your response. I actually get an error saying that I have either missing or incorrect argument for nnz. The 'strfind(R1,s)' gives me 1x2 cell array. Is there a way to fix this?

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Chemistry에 대해 자세히 알아보기

제품

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by