필터 지우기
필터 지우기

Matlab. Find the indices of a cell array of strings with characters all contained in a given string (without repetition)

조회 수: 5 (최근 30일)
I have one string and a cell array of strings.
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'};
I want to obtain:
idx = [2, 3, 6];
I have written a very long code that:
1. finds the elements with length not greater than length(str);
2. removes the elements with characters not included in str;
3. finally, for each remaining element, checks the characters one by one
Essentially, it's an almost brute force code and runs very slowly. I wonder if there is a simple way to do it fast.

채택된 답변

Mohsen Nosratinia
Mohsen Nosratinia 2013년 10월 13일
You can sort the strings and then match them using regular expression. For your example the pattern will be ^a{0,2}c{0,1}t{0,1}z{0,1}$:
u = unique(str);
t = ['^' sprintf('%c{0,%d}', [u; histc(str,u)]) '$'];
s = cellfun(@sort, dic, 'uni', 0);
idx = find(~cellfun('isempty', regexp(s, t)));

추가 답변 (3개)

Cedric
Cedric 2013년 10월 13일
편집: Cedric 2013년 10월 13일
Here is another solution, for the fun of it ..
>> spectrum = @(s) accumarray(s.'-64, ones(size(s)), [58,1]) ;
>> str_spec = spectrum(str) ;
Then
>> find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
ans =
2 3 6
>> dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz', 'aaaz'} ;
>> find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
ans =
2 3 6 8
where we see that 'aaz' was taken into account but not 'aaaz' as is has more a's than str.
  댓글 수: 2
N/A
N/A 2013년 10월 13일
Cedric, when I try your code it returns only 2.
When I use my real cell of strings, it returns this error:
??? Error using ==> accumarray
First input SUBS must contain positive integer subscripts.
Error in ==> @(s)accumarray(s.'-64,ones(size(s)),[58,1])
Error in ==> @(s)all(spectrum(s)<=str_spec)
I am trying to understand why this is happening.
Cedric
Cedric 2013년 10월 13일
편집: Cedric 2013년 10월 13일
A copy/paste with the following returns only 2?
str = 'actaz' ;
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'} ;
spectrum = @(s) accumarray(s.'-64, ones(size(s)), [58,1]) ;
str_spec = spectrum(str) ;
find(cellfun(@(s)all(spectrum(s)<=str_spec), dic))
It shouldn't.
In your real cell of strings, do you have special characters, numbers or spaces? I wrote this solution thinking that there would be only letters (lower or upper case). If there are spaces and digits, please use the update version of spectrum:
spectrum = @(s) accumarray(s.'-31, ones(size(s)), [91,1]) ;
If there can be any special character, just use
spectrum = @(s) accumarray(s.'-0, ones(size(s)), [256,1]) ;
PS: s.'-0 could be replaced by double(s).', but I wanted to keep the previous structure for it not to be even more confusing ;-)

댓글을 달려면 로그인하십시오.


Jos (10584)
Jos (10584) 2013년 10월 13일
Take a look at my function MATCHROW:
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz'};
indices = find(cellfun(@(x) matchrow(str,x),dic))
The warning it issues (when numel(str) > numel(dic{k}) ), can be ignored. The function can be found here: http://www.mathworks.com/matlabcentral/fileexchange/14520

Azzi Abdelmalek
Azzi Abdelmalek 2013년 10월 13일
편집: Azzi Abdelmalek 2013년 10월 13일
str = 'actaz';
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz'};
idx1=find(cellfun(@numel,dic)<=numel(str))
dic1=dic(idx1)
idx2=cellfun(@(x) numel(unique(x))==numel(x),dic1)
idx3=idx1(idx2)
dic3=dic1(idx2)
idx4= cellfun(@(x) all(ismember(x,str)),dic3)
indices=idx3(idx4)
  댓글 수: 2
N/A
N/A 2013년 10월 13일
Dear Azzi,
your solution is amazing and very elegant, although apparently it is also a bit slow (but faster than the code that I am currently using).
There is only one problem. Suppose that
dic = {'aaccttzz', 'ac', 'zt', 'ctu', 'bdu', 'zac', 'zaz', 'aaz'};
the solution should be
indices = [2 3 6 8];
because 'a' appears twice in str. I realize I should have made clear that characters can appear at most n times in dic if they appear n times in str - i.e characters cannot be repeated more times than they are repeated in str.
Do you think your code can be adjusted to this condition?
Anyway, thank you a lot for your prompt answer!

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Cell Arrays에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by