필터 지우기
필터 지우기

Comparing lists of strings

조회 수: 35 (최근 30일)
Neil
Neil 2012년 11월 17일
OK so I have two lists of, say, names, List_one and List_two.
- List_one is smaller than List_two: they are both cell arrays.
I want to find the indices where List_two contains elements from List_one. So far, I've been using strcmpi, but because the lists are different sizes (and in different orders) I need to do it element by element, which I can't believe is the most efficient solution. Any tips?
Just now I'm doing something like:
for i = 1:length(List_one) tf = strcmpi(List_one(i),List_two); ind = ind + tf; % so that the end result List_two(ind) = List_one. end
and I just can't imagine that's the best way, though I've spent a long time reading about ways to compare lists, I still haven't found anything satisfactory. One more, related, question:
If I have a list of names, and someone permutes it (they don't give me the indices for the new ordering, only the new list) - what is the best way to uncover the indices?
As in, List_two(indices) = List_two_permuted; - how do I find the permutation?

채택된 답변

Jan
Jan 2012년 11월 18일
편집: Jan 2012년 11월 18일
Both tasks can be achieved by the fast C-Mex FEX: CStrAinBP:
List1 = {'A', 'b', 'cd', 'eFG', 'Miss'};
List2 = {'b', 'A', 'eFG', 'cd', 'Q', 'A'};
[Ex, Seq] = CStrAinBP(List1, List2);
% >> Ex = 1 2 3 4
% >> Seq = 2 1 4 3
% Now: isequal(List1(Ex), List2(Seq))
Repeated strings are considered. The 3rd input "i" triggers a case-insensitive comparison.
Speed (measured with R2009a, 184 folder names, Core2Duo 2.3 GHz):
List1 = regexp(path, pathsep, 'split');
List2 = List1(randperm(length(List1)));
tic; for k=1:1000, [Tf, Loc] = ismember(List1, List2); end; toc
% >> 1.747856 sec
tic; for k=1:1000, [Ex, Seq] = CStrAinBP(List1, List2); end; toc
% >> 0.071419 sec
For large strings lists, e.g. 100'000 strings, the sorting of ismember has benefits, because the binary search is cheaper than the linear search of CStrAinBP.

추가 답변 (1개)

Azzi Abdelmalek
Azzi Abdelmalek 2012년 11월 17일
편집: Azzi Abdelmalek 2012년 11월 17일
List_one='is smaller than'
List_two='they are both cell arrays'
idx=find(ismember(List_one,List_two))

카테고리

Help CenterFile Exchange에서 Structures에 대해 자세히 알아보기

제품

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by