How to find the exact location of a word in a string?

Question

Yunfei Zhang 2016년 2월 13일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/267985-how-to-find-the-exact-location-of-a-word-in-a-string

댓글: Guillaume 2016년 2월 13일

I have a string that 'chemical engineering is a challenge for electrical engineer'. I used to use 'strfind' function to find the exact location of the word‘engineer'. However, there is a problem that word engineering is also included in my results. How can i just get the location of word 'engineer' instead of 'engineering'.

 list='chemical engineering is a challenge for electrical engineer';
 temp=findstr(list,'engineer')

The result is

temp =
      10    52

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Star Strider 2016년 2월 13일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/267985-how-to-find-the-exact-location-of-a-word-in-a-string#answer_209694

MATLAB Online에서 열기

This regexp call will pick up only ‘engineer’:

Str = 'chemical engineering is a challenge for electrical engineer';
idxs = regexp(Str, 'engineer\>')
idxs =
    52

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

Yunfei Zhang 2016년 2월 13일

편집: Yunfei Zhang 2016년 2월 13일

MATLAB Online에서 열기

Sorry for confusion. Before asking this question, i simplified the question. 'Pre' is a cell matrix containing 20 documents and each document is a long string. 'word' is a cell matrix and containing 1099 words from these 20 document after removing stopwords. What I wanted to do is to construct a 20*1099 matrix to show each word's frequency in different documents and it leaded to the problem mentioned above that 'engineer' may have higher frequency than the 'engineering' for the word dictionary. However, I think the function you suggested is the correct way to find the location of each word. After finding the correct location of words like 'enginer', I can calculate the frequency of this word and indicate it at the corresponding location using code below. Guillaume provided me with a method of building the regular expression for each word and it works. However, it is based on the sacrifice of time to achieve higher accuracy and it takes much longer time when processing a large number of articles (when 'pre' contains a large number of long strings.)

if(~isempty(temp))     
        docum(i,j)=size(temp,2);  
end

Guillaume 2016년 2월 13일

편집: Guillaume 2016년 2월 13일

MATLAB Online에서 열기

You can prebuild the regular expressions before the loops if you wish.

word = strcat(word, '\>')

Yunfei Zhang 2016년 2월 13일

Thank you! It helps a lot for controlling the processing time as i also want to do the feature selection and clustering for my data.

댓글을 달려면 로그인하십시오.

Answer 2

Guillaume 2016년 2월 13일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/267985-how-to-find-the-exact-location-of-a-word-in-a-string#answer_209710

편집: Guillaume 2016년 2월 13일

MATLAB Online에서 열기

Another option, since the words you're trying to match are always delimited by spaces or the end of the sentence (other punctuation marks are already embedded in the words), is to add a space to the end of each word and to the end of each sentences. That way 'engineer ' does not match 'engineering ' anymore:

tic
docum = zeros(numel(pre), numel(word));
word2 = strcat(word, {' '}); %strcat removes trailing ' ' if it's not in a cell array
pre2 = strcat(vertcat(pre{:}), {' '}); %why is your pre a cell array of 1x1 cell arrays?
for widx = 1:numel(word)
   docum(:, widx) = cellfun(@numel, strfind(pre2, word2{widx}));
end
toc

I'm not convinced it's going to be faster than regexp:

tic
docum = zeros(numel(pre), numel(word));
word2 = strcat(word, '\>'); 
pre2 =vertcat(pre{:}); %why is your pre a cell array of 1x1 cell arrays?
for widx = 1:numel(word)
   docum(:, widx) = cellfun(@numel, regexp(pre2, word2{widx}));
end
toc

In my testing they take both more or less the same time.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Star Strider 2016년 2월 13일

@Guillaume — Thank you. I had to be away for a few minutes.

Guillaume 2016년 2월 13일

@Yunfei, what is probably having the most effect on the processing speed is that I apply the regexp or strfind to all the sentences at once. There is only one loop, looping over the individual words.

댓글을 달려면 로그인하십시오.

How to find the exact location of a word in a string?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

추가 답변 (1개)

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

How to find the exact location of a word in a string?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 6 이전 댓글 4개 표시이전 댓글 4개 숨기기

추가 답변 (1개)

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기