How do I compare a string with a #single word?

Hello I am trying to compare a string with 'word'. for example if the word ‘retro’ is in the text file, and ‘#retro’ appear in the str,
str = It was #crazy #retro.
word = 'retro'
How do I compare the str with word including the hashtag. I tried using
strfind(lower(str), '#line2')
but it gave me an empty vector.
Thank you.

 채택된 답변

Guillaume
Guillaume 2015년 2월 27일
편집: Guillaume 2015년 2월 28일

2 개 추천

All of the solution proposed so far have the problem that they'll find the hashtag #retro in the hashtag #retrorocket, which I don't think is wanted.
At the end of the day, a very good parser for strings has been invented long ago, it's called a regular expression. Here is a way to get your matches without the need of a loop:
hiplist = {'denim'; 'vinyl'; 'retro'};
teststr = 'the denim #vinyl was #crazy #retro but the #retrorocket went backward';
%build regular expression from hitlist:
regpattern = sprintf('\\<(?<=#)(%s)\\>', strjoin(hiplist, '|'));
matches = regexp(teststr, regpattern, 'match')
The pattern I've built only match hashtags surrounded by whitespaces (regular spaces, newlines, tabs, etc.) or at the beginning and end of the string. A hashtag followed by a punctuation mark will not be detected, but it's a fairly small change to the regex if wanted.

댓글 수: 8

Kratos
Kratos 2015년 2월 27일
Thank you for the help but the problem is I have 14 different words and I have to compare it to the same line. Since I am going through the same line 14 times, it counts hostage every single time. How do I make sure that I count it only one time?
Guillaume
Guillaume 2015년 2월 27일
I don't understand what you're saying. In my example it find all the hashtags at once. The line is only processed once even if you want to find 14 hashtags (is that what you mean by hostage?).
If the line contains the same hashtag several time, and you want it only once, remove the duplicates with unique.
If you're asking something, be more clear and show an example.
Kratos
Kratos 2015년 2월 28일
편집: Kratos 2015년 2월 28일
Okay so here is the question "You should also award one faux-hipster point for every hashtag used in the conversation. A hashtag is defined as the pound (#) followed immediately by any non-space character. Finally, you are able to have overlapping points. For example, if the word ‘denim’ were in the hip words file, and ‘#denim’ appeared in the conversation file, you would need to award 2 points because it is a hipster word and a hashtag." when I do this to compare #
if strfind(lower(line1), '#')
a = length(strfind(lower(line1), '#'));
for ind = 1:length(a)
A = [A a(ind)];
end
end
and the code that you did to find #denim. I am only suppose to get 2 for the both of them. Since I run both of my code in the while loop I get 1 for the # and 2 for #denim. I am supposed to get only 2. How do I make sure I get 2?
I don't know why you keep insisting on using loops, strfind and so on, when I showed that none of them are necessary.
Can you also make sure that what you write makes sense. You're talking about a while loop, there isn't any such loop in your example. It makes it that much more difficult to help you if I can't understand what your asking.
My understanding is that you want to find all the hashtags and award more points to some of them. It's very simple:
sentence = 'There are #several #hashtags in this #sentence and #some of them are #hip'
hiptags = {'sentence', 'hip'};
hashtags = unique(regexp(sentence, '\<(?<=#).*?\>', 'match'))
hashpoints = ismember(hashtags, hiptags) + 1
totalpoints = sum(hashpoints)
If that's still not what you want, then provice an example of input and expected result rather than some code that doesn't do what you want.
we are comparing a text file conversation with text file words.
Conversation
Did you go to the indie concert last night?
It was #crazy #retro. I wore my homemade denim jacket and toe socks.
Then I went and got some frozen yogurt.
Words:
indie
psl
frozen yogurt
instagram
insta
scarf
skinny jeans
toe socks
retro
homemade
mainstream
thrift store
denim
plaid
And here is my code:
function out = fauxHipster(str, hipwords)
A = [];
fh1 = fopen(str);
line1 = fgetl(fh1);
while ischar(line1)
fh2 = fopen(hipwords);
line2 = fgetl(fh2);
if strfind(lower(line1), '#')
a = length(strfind(lower(line1), '#'));
for ind = 1:length(a)
A = [A a(ind)];
end
end
while ischar(line2)
a = 0;
det = strfind(line1,line2);
if ~isempty(det)
if det>1 & line1(det-1)=='#'
a = a+2;
for ind = 1:length(a)
A = [A a(ind)];
end
end
end
if strfind(lower(line1), line2)
a = length(strfind(lower(line1), line2));
for ind = 1:length(a)
A = [A a(ind)];
end
end
line2 = fgetl(fh2);
end
line1 = fgetl(fh1);
end
fclose(fh1);
fclose(fh2);
out = sum(A);
end
For the text file conversation that I haven't posted everything works fine but for this one since there are two hostages and '#retro' should only be counted as one I keep getting 10 as my output. I am suppose to get 8.
It looks like you're not reading what I write. For the last time, you don't need loops nor strfind.
Your method also won't work if your hiplist contains words that start with another word in the list (e.g. retro and retrorocket).
Furthermore, your hiplist does not even match your definition of a hashtag " A hashtag is defined as the pound (#) followed immediately by any non-space character" since some of the words have space characters.
Finally, once again what you wrote doesn't make sense. How are you supposed to get 8 when there are only two hashtags (hostages???) in the conversation.
Anyway, this and my previous answers should be enough to get you going:
conversation = fileread(conversationfile);
hipwords = strsplit(fileread(hipwordfile), '\n');
%I've no idea what you want and what you score since you
%can't express it clearly, it may be:
hashtags = unique(regexp(sentence, '\<(?<=#).*?\>', 'match'))
hashpoints = ismember(hashtags, hiptags) + 1
totalpoints = sum(hashpoints)
%or maybe
regpattern = sprintf('\\<(?<=#)(%s)\\>', strjoin(hiplist, '|'));
hashtags = regexp(teststr, regpattern, 'match')
totalpoints = numel(hashtags) * 2
%or something else
okay so here is the whole question for my code
The number of phony hipsters is on the rise, and you don’t want that to encroach on your genuine creativity and frugal means of clothing yourself. So, like any true hipster would, you write a MATLAB function to determine how much of a fake hipster someone is. The function will input the name of a “.txt” file containing the text of a conversation they had and a second “.txt” filename containing words to check for in the conversation. Your function should award one faux-hipster point for each occurrence of the “hip” words in the text conversation.
You should also award one faux-hipster point for every hashtag used in the conversation. A hashtag is defined as the pound (#) followed immediately by any non-space character.
Finally, you are able to have overlapping points. For example, if the word ‘denim’ were in the hip words file, and ‘#denim’ appeared in the conversation file, you would need to award 2 points because it is a hipster word and a hashtag. Finding word matches should NOT be case sensitive (‘scarf’ is the same as ‘SCARF’).
This is what I am supposed to do.
Guillaume
Guillaume 2015년 2월 28일
I've given you 99% of the solution in my last answer. You only need to modify slightly one of the regular expression (and it's just removing part of it) and make it case insensitive (it's explained how in the doc), add one line to calculate the score and you're done.
As it is an assignment, I'm not going to help you any further.

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Image Analyst
Image Analyst 2015년 2월 27일

0 개 추천

Not sure what you're exactly looking to do so I just offer some possibilities:
str = 'It was #crazy #retro.'
word = 'retro'
hashLocations = str == '#' % Logical vector
hashIndexes = find(hashLocations) % Actual index numbers.
location = strfind(lower(str), '#retro') + 1 % Skips past #
location = strfind(lower(str), word)
In command window:
hashLocations =
0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0
hashIndexes =
8 15
location =
16
location =
16

댓글 수: 1

Okay so here is the question: if the word ‘denim’ were in the hip words file, and ‘#denim’ appeared in the conversation file, you would need to award 2 points because it is a hipster word and a hashtag.
I did it like this
if strfind(lower(line1), line2)
a = length(strfind(lower(line1), line2));;
for ind = 1:length(a)
A = [A a(ind)];
end
elseif strfind(lower(line1), '#')
a = length(strfind(lower(line1), '#'));
for ind = 1:length(a)
A = [A a(ind)];
end
elseif strfind(lower(line1), )
a = length(strfind(lower(line1), ));
for ind = 1:length(a)
A = [A a(ind)];
end
else
I am missing the second elseif. I don't know how to compare the hashtag and the word.

댓글을 달려면 로그인하십시오.

Joseph Cheng
Joseph Cheng 2015년 2월 27일
편집: Joseph Cheng 2015년 2월 27일

0 개 추천

you can try something like this where hiplist is your hipster word dictionary. Then in my loop there you test for hits against the dictionary in the test string and then look for n-1 index for whether it was a pound sign and award points for each one.
hiplist = [{'denim'};{'vinyl'};{'retro'}];
teststr = 'the denim #vinyl was #crazy #retro.';
pointsawarded=0;
for ind = 1:length(hiplist)
det = strfind(teststr,hiplist{ind});
if ~isempty(det)
if det>1 & teststr(det-1)=='#'
pointsawarded = pointsawarded+2;
end
end
end
disp(teststr)
disp(['got ' num2str(pointsawarded) ' points'])
oh and use lower such that the detection isn't case sensitive.

댓글 수: 5

Guillaume
Guillaume 2015년 2월 27일
편집: Guillaume 2015년 2월 27일
That hiplist declaration just look weird to me. Isn't:
hiplist = {'denim'; 'vinyl'; 'retro'};
simpler?
Also, your code doesn't deal with strfind returning more than one match. You won't get any point on this one:
teststr = 'retro #retro';
Joseph Cheng
Joseph Cheng 2015년 2월 27일
편집: Joseph Cheng 2015년 2월 27일
slight modification
hiplist = [{'denim'};{'vinyl'};{'retro'}];
teststr = 'the retro denim #vinyl was #crazy #retro.';
pointsawarded=0;
for ind = 1:length(hiplist)
det = strfind(teststr,hiplist{ind});
if ~isempty(det)
if det>1
pointsawarded = pointsawarded+2*sum(teststr(det-1)=='#');
end
end
end
disp(teststr)
disp(['got ' num2str(pointsawarded) ' points'])
moved the check for number of # infront of the hits to the sum. This is to mitigate if a hipster word was used but it was not tagged.
what is not implemented and needs an adaptation is what happens when in the teststr retro is the first word. then the det>1 if statement gets skipped completely.
Stephen23
Stephen23 2015년 2월 27일
편집: Stephen23 2015년 3월 1일
@Joseph Cheng: a simpler way of generating this cell array:
hiplist = [{'denim'};{'vinyl'};{'retro'}];
is like this:
hiplist = {'denim'; 'vinyl'; 'retro'};
Joseph Cheng
Joseph Cheng 2015년 2월 27일
편집: Joseph Cheng 2015년 2월 27일
good call, i don't deal with cells often when hard coded in.
Kratos
Kratos 2015년 2월 27일
Thank you for the help but the problem is I have 14 different words and I have to compare it to the same line. Since I am going through the same line 14 times, it counts hostage every single time. How do I make sure that I count it only one time?

댓글을 달려면 로그인하십시오.

질문:

2015년 2월 27일

편집:

2015년 3월 1일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by