how to search for multiple words anywhere in the sentence ?
이전 댓글 표시
I want to search for three words "Battery , power , failure" the three must exist in the sentence in any order to copy the cell .
I try :
j=1;
k=1;
D=alldata(:,126:130);
idx = cellfun('isclass',D,'char');
idx(idx)=~cellfun('isempty',regexpi(D(idx),'battery|power|failure')) ;
data = alldata(any(idx,2),:);
Notdata = alldata(~any(idx,2),:); %save rows which didn't contain
but it search for any cell contains for one of the three.
how i can search for the cells contains the three words in any order?
답변 (3개)
the cyclist
2015년 9월 19일
0 개 추천
The most straightforward way, it seems to me, is to do the regexp search three times, once for each word, and then copy the cells where all three match. I am not sure there is a way to do an "and" match in the same way one can do an "or" match like you have done.
per isakson
2015년 9월 19일
편집: per isakson
2015년 9월 20일
Try this
sentence_1 = 'abc battery def power ghi failure';
typo_str_1 = 'abc battery def power ghi faiXure';
sentence_2 = 'Battery def power ghi failure.';
typo_str_2 = 'abc Xbattery def power ghi failure';
words = {'battery','power','failure'};
is1 = cellfun( @(str) not(isempty(regexpi( sentence_1, ['\<',str,'\>'] ))), words );
is2 = cellfun( @(str) not(isempty(regexpi( typo_str_1, ['\<',str,'\>'] ))), words );
is3 = cellfun( @(str) not(isempty(regexpi( sentence_2, ['\<',str,'\>'] ))), words );
is4 = cellfun( @(str) not(isempty(regexpi( typo_str_2, ['\<',str,'\>'] ))), words );
 
A different approach
>> cssm(1)
Elapsed time is 0.001078 seconds.
ans =
1 0 0 1 0 0
>> cssm(1e3);
Elapsed time is 0.791887 seconds.
where
function has_all_three = cssm( N )
sentence_1 = 'Abc battery def power ghi failure.';
typo_str_1 = 'Abc battery def power ghi faiXure.';
multistr_1 = 'Abc battery def power ghi battery.';
sentence_2 = 'Battery def failure ghi power jkl.';
typo_str_2 = 'Abc Xbattery def power ghi failure';
multistr_2 = 'Abc power def power ghi power jkl.';
%
test_sentences = {sentence_1,typo_str_1,multistr_1,sentence_2,typo_str_2,multistr_2};
%
text_corp = repmat( test_sentences, [N,1] );
tic
cac = regexpi( text_corp, ['\<(battery)|(power)|(failure)\>'], 'match' );
has_all_three = cellfun( @(c) length(unique(lower(c)))==3, cac );
toc
end
댓글 수: 12
Amr Hashem
2015년 9월 19일
per isakson
2015년 9월 19일
편집: per isakson
2015년 9월 19일
"... but thats not what i want"
Then you need to better explain what you want. And also explain why my hint isn't useful to you.
Amr Hashem
2015년 9월 19일
John D'Errico
2015년 9월 19일
Because he wants a magic solution.
Amr Hashem
2015년 9월 19일
편집: Amr Hashem
2015년 9월 19일
per isakson
2015년 9월 19일
편집: per isakson
2015년 9월 19일
The task is:   "search for three words "Battery, power, failure" the three must exist in the sentence in any order".   Is that correct?
"I have about (57000*6 cell)"   How are that cell array related to alldata(:,126:130)? Thus, with one sentence per cell, you have 0.342 million sentences(?). What is an acceptable execution time?
"I only need to modify this line:"   You need at least to explain what you expect the line to do! Why should I guess?
"I only want to solve this problem"   What problem? Why only? What make you think that it is even possible to accomplish the task with a code along the lines, which you propose? I don't think it is possible!
btw: "Xbattery" should that match "battery"?
Amr Hashem
2015년 9월 19일
per isakson
2015년 9월 20일
"I am now asking is it possible to modify the code or not? "   I repeat: I don't think it is possible!
per isakson
2015년 9월 20일
편집: per isakson
2015년 9월 20일
Three words in any order is a tough job for regexp.   "to do the regexp search three times, once for each word"   is a sound approach and I cannot understand why you dismissed it.
per isakson
2015년 9월 20일
I added a new code to my answer.
Amr Hashem
2015년 9월 20일
Amr Hashem
2015년 9월 20일
댓글 수: 1
Cedric
2015년 9월 22일
This can be simplified as developed in my answer. I move it below as a comment:
Here is an alternate solution:
keywords = {'battery', 'power', 'failure'} ;
allCells = {'V_batterypowerfailure', 'I_batterypwerfailure'; ...
'V_batterypowerfailure', 'I_atterypowerfailure'; ...
'I_batterypowerfailre', 'V_batterypowerfailure'} ;
ids = 1 : numel( allCells ) ;
for k = 1 : numel( keywords )
isFound = ~cellfun( 'isempty', strfind( allCells(ids), keywords{k} )) ;
ids = ids(isFound) ;
end
validCells = allCells(ids) ;
You'll notice that it works on a pool of cells which reduces with the keyword index (as when a keyword is not found, there is no point in testing the others). I started valid entries of the dummy data set with V_ and invalid entries with I_ to simplify the final check.
If you need a case-insensitive solution, replace
strfind( allCells(ids), keywords{k} )
with
regexpi( allCells(ids), keywords{k}, 'once' )
카테고리
도움말 센터 및 File Exchange에서 Characters and Strings에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!