How to search a substring in a list of strings?
조회 수: 60 (최근 30일)
이전 댓글 표시
I have {'xx', 'abc1', 'abc2', 'yy', 'abc100'} and I would like to search 'abc' and get back {'abc1', 'abc2', 'abc100'}. Is it possible to do this in a simple way without a for cycle?
댓글 수: 0
답변 (4개)
Jos (10584)
2018년 1월 29일
In recent releases you can use startsWith
A = {'xx', 'abc1', 'abc2', 'yy', 'abc100'}
tf = startsWith(A,'abc')
B = A(tf)
See the documentation on string functions for many other utilities that may be useful for you.
댓글 수: 0
Stephen23
2018년 1월 29일
Much faster than using cellfun or any string functions:
>> C = {'xx', 'abc1', 'abc2', 'yy', 'abc100'};
>> Z = C(strncmp(C,'abc',3));
>> Z{:}
ans = abc1
ans = abc2
ans = abc100
댓글 수: 1
Jan
2018년 1월 29일
편집: Jan
2018년 1월 29일
If you want to search at the start of the strings only, this is efficient:
A = {'xx', 'abc1', 'abc2', 'yy', 'abc100'};
B = s(strncmp(s, 'abc', 3));
Some timings:
% Some larger test data:
A = repmat({'xx', 'abc1', 'abc2', 'yy', 'abc100'}, 1, 1000);
S = string(A);
tic;
for k = 1:1000
tf = startsWith(A, 'abc');
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = startsWith(S, 'abc');
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = strncmp(s, 'abc', 3);
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = cellfun(@any,strfind(A, 'abc'));
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = ~cellfun('isempty', strfind(A, 'abc'));
B = A(tf);
end
toc
Elapsed time is 1.492006 seconds. % startsWith(cell string)
Elapsed time is 0.308345 seconds. % startsWith(string)
Elapsed time is 0.018157 seconds. % strncmp
Elapsed time is 8.095714 seconds. % cellfun(@any, strfind)
Elapsed time is 1.706694 seconds. % cellfun('isempty', strfind)
Note that cellfun method searches for the substring anywhere in the strings, while the two other methods search at the start only. With modern string methods this would be:
tf = contains(A, 'abc');
This has an equivalent speed as startsWith.
@MathWorks: strncmp is 17 times faster for cell strings than startsWith for strings. The conversion from cell strings to strings inside startsWith let it run 65 times slower than strncmp. There is a great potential for improvements.
Fangjun Jiang
2018년 1월 29일
s={'xx', 'abc1', 'abc2', 'yy', 'abc100'};
index=cellfun(@any,strfind(s,'abc'));
s(index)
댓글 수: 2
Jan
2018년 1월 29일
편집: Jan
2018년 1월 29일
@Matt J: But cellfun is at least a fast C-mex function. Every function to process a cell string must contain a loop anywhere. The problem is using cellfun with an expensive anonymous function. About 4 times faster (but still slower than startsWith, timings see my answer):
index = ~cellfun('isempty', strfind(s, 'abc'));
참고 항목
카테고리
Help Center 및 File Exchange에서 Data Type Conversion에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!