Finding the indexes of multiple substrings within a larger string.

조회 수: 3 (최근 30일)
Steve
Steve 2023년 3월 24일
댓글: Steve 2023년 4월 1일
I’m trying to find the indexes of all two digit pairs in a very long string of numbers, say “c”. I can easily find all occurrences of one string at a time; for example strfind(c, ’00’)…strfind (c, ’01’). But I want a way to do this for all sets one hundred sets; 00 to 99. I tried this:
x=0:99;
dig=sprintf('%02d ',x);
%converts the vector 0to99 into a string with two digits, space between numbers
dub_dig=strsplit(dig);
%splits each pair into cells
dub_dig_str=string(dub_dig);
%converts to a string
How do I get this sequence of strings (dub_dig_str) to work in something like a for loop using the strfind function? When I try this it crashes. I would like to output a matrix of indexes of where each pair occurs, for all pairs.
Thanks

채택된 답변

Stephen23
Stephen23 2023년 3월 24일
편집: Stephen23 2023년 3월 24일
idx = regexp(c,'\d\d') % no overlaps
idx = regexp(c,'\d(?=\d)') % with overlaps
  댓글 수: 7
Stephen23
Stephen23 2023년 3월 26일
편집: Stephen23 2023년 3월 27일
"My goal is to output a separate row of indexes for each pair of numbers (one hundred total, 00to99), stating where each appears in c."
Aaah, so you actually want to compare the pairs against another set with a specific order, which is what you were achieving with the loop. Here is an alternative approach:
c = char(randi(+'09',1,123)) % random data
c = '636906240866589674219474419874013401492319709264753858931901195001783553643124423974644494528171633587231581331511128165489'
% Character pairs:
[T,U] = meshgrid('0':'9'); % all pairs
P = cellstr([T(:),U(:)]) % all pairs
P = 100×1 cell array
{'00'} {'01'} {'02'} {'03'} {'04'} {'05'} {'06'} {'07'} {'08'} {'09'} {'10'} {'11'} {'12'} {'13'} {'14'} {'15'} {'16'} {'17'} {'18'} {'19'} {'20'} {'21'} {'22'} {'23'} {'24'} {'25'} {'26'} {'27'} {'28'} {'29'}
Q = cellstr(c([1:end-1;2:end]).'); % data pairs
% Find indices of data pairs:
[~,X] = ismember(Q,P);
% Place indices into cell array:
Y = (1:numel(Q)).';
Z = accumarray(X,Y,[100,1],@(a){a})
Z = 100×1 cell array
{[ 64]} {4×1 double} {0×0 double} {0×0 double} {0×0 double} {0×0 double} {[ 5]} {0×0 double} {[ 9]} {[ 44]} {0×0 double} {3×1 double} {2×1 double} {2×1 double} {[ 36]} {2×1 double}
Checking the indices of '00' and some random pair:
Z{1}
ans = 64
Z{strcmp(P,'23')}
ans = 3×1
39 80 103
You can probably do something simiar with table operations. Lets try it now:
D = cell2table(Q, 'VariableNames',"Pair");
D.Index = (1:numel(Q)).';
G = groupsummary(D,"Pair",@(a){a})
G = 69×3 table
Pair GroupCount fun1_Index ______ __________ ____________ {'00'} 1 {[ 64]} {'01'} 4 {4×1 double} {'06'} 1 {[ 5]} {'08'} 1 {[ 9]} {'09'} 1 {[ 44]} {'11'} 3 {3×1 double} {'12'} 2 {2×1 double} {'13'} 2 {2×1 double} {'14'} 1 {[ 36]} {'15'} 2 {2×1 double} {'16'} 2 {2×1 double} {'17'} 2 {2×1 double} {'19'} 5 {5×1 double} {'21'} 1 {[ 19]} {'23'} 3 {3×1 double} {'24'} 2 {2×1 double}
Steve
Steve 2023년 4월 1일
Thank you. This works. I must admit, as a beginner, some of the code looks cryptic (e.g., "@(a){a}", and the output of cells 'Z' is hard to work with mathematically, but I'm sure it's possible. I'm appreciating the tradeoffs between classic numerical functions and string approaches.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Walter Roberson
Walter Roberson 2023년 3월 24일
c = 'a91bb48353'
c = 'a91bb48353'
mask = ismember(c, '0':'9');
odd_pair = find(mask(1:2:end-1) & mask(2:2:end)) * 2 - 1
odd_pair = 1×2
7 9
even_pair = find(mask(2:2:end-1) & mask(3:2:end)) * 2
even_pair = 1×3
2 6 8
pair_starts_at = union(odd_pair, even_pair)
pair_starts_at = 1×5
2 6 7 8 9
  댓글 수: 2
Walter Roberson
Walter Roberson 2023년 3월 26일
c = char(randi([0 9], 1, 30) + '0')
c = '305452612469209463851343808968'
C = c - '0';
odds = C(1:2:end-1) * 10 + C(2:2:end);
evens = C(2:2:end-1) * 10 + C(3:2:end);
odd_idx = (1:numel(odds)) * 2 - 1;
even_idx = (1:numel(evens)) * 2;
indices = accumarray([odds(:); evens(:)] + 1, [odd_idx(:); even_idx(:)], [], @(locs){locs});
populated = find(~cellfun(@isempty, indices));
[num2cell(populated-1), indices(populated)]
ans = 27×2 cell array
{[ 5]} {[ 2]} {[ 8]} {[ 26]} {[ 9]} {[ 14]} {[12]} {[ 8]} {[13]} {[ 21]} {[20]} {[ 13]} {[24]} {[ 9]} {[26]} {[ 6]} {[30]} {[ 1]} {[34]} {[ 22]} {[38]} {2×1 double} {[43]} {[ 23]} {[45]} {[ 4]} {[46]} {2×1 double} {[51]} {[ 20]} {[52]} {[ 5]}
Steve
Steve 2023년 4월 1일
Thank you Walter. This method worked for me as well. Cheers

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Characters and Strings에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by