Adding up words in matrices on Matlab

Example
hello
My name is Kevin
Hello my name is Susan
u1=[1]
u2[0,1,1,1,1]
u3=[1,1,1,1,0,1]
So u1 has a matrix with 1 as the word hello is in fact in the first sentence. Then u2 has[0,1,1,1,1] as 'hello' is not in the second sentence but 'my' 'name' 'is' and 'kevin' are.
And the same goes for u3, it contains the boolean value for 'hello' 'my' 'name' 'is' 'Kevin' 'Susan' respectively, with 'Kevin' being 0 as it's not in this final sentence.
As there are 7 different words in my example, the last matrix should have 7 indices.
.
How would I go in implementing such an algorithm on Matlab?
The sentences are in a file which I have to read onto Matlab. I'm able to read the sentences and put them in matrices,
while~feof(file) eachLine=fgetl(file) if isempty(eachLine)||strncmp(eachLine, '%',1)||~ischar(eachLine) ...
matrix=regexp(eachLine, ' ', 'split')

답변 (2개)

Babak
Babak 2013년 4월 11일

0 개 추천

b = {'Hello' 'my' 'name' 'is' 'kevin' 'Susan'};
a = strsplit('kevin, kevin my baby I am telling you Hello Hello my name is Susan not Susana');
% a is the string you would like to test if b's keywords exits in or not.
u = zeros(size(b));
for j = 1: length(b)
counter = 0;
for k = 1:length(a)
if isequal(b{j},a{k})
counter = counter +1;
end
end
u(j) = counter;
end
u

댓글 수: 4

Blaise
Blaise 2013년 4월 11일
편집: Blaise 2013년 4월 12일
Your code shows the number of times a word comes up in the sentence a. I'm trying to add up all the existing words into one matrix. I guess it's a little hard to explain.
b={'hello' 'my' 'name' 'is 'kevin'}
a={'and' 'my' 'name' 'is' 'susan'}
Is the word 'hello' in b? Yes, output 1 Is 'my' in b? Yes. Output 1...
Then you have this [1 1 1 1 1 1] for b
For the second sentence, is the word 'hello' in a? No, output 0 Is the word 'my' in a? Yes, output 1. Is the word 'name' in a? Yes, output 1..., then you go through a, is the word 'and' in a? Yes. Output 1. Is the word 'my' in a? Yes, but output nothing because we've already seen the word 'my', all the way to is the word 'Susan' in a? Yes. Output 1 So we get
[0 1 1 1 0 1 1 ]. The first index is for the word hello, then my, name, is, kevin, and, susan. So the last matrix shows all the existing words which belong in both sentences.
Babak
Babak 2013년 4월 12일
my code assumes that there is a base set of words which are in b and you can check any sentence a with it to see how many times the words of b have appeared in sentence a and repeated.
I think you can reformulate our problem to make it seem like what I wrote for u, but I don't completely understand your explaination. You need to have a base and compare the other one to it.. I don't get it you compare b with itself and get all ones? that's so obvious.
My aim is to calculate what words are in sentences, I'm doing a dialogue act tagging report, and have to use matlab for it, so I'm going to add up all these matrices at the end to work out the mean.
My first sentence is b, so I search the first word in the array b and compare it to all other words in the array, I do the same with 'name' etc and get a matrix with all 1's.
Then I search for the word 'hello' in a. It's not in a, so I assign 0 to 'hello', is the word 'my' in a? Yes, it is, so I assign a 1 to 'my'. Is 'name' in a? Yes, so assign 1 to it, is 'is' in a? Yes, assign 1 to, is 'Kevin' in a? No, so assign it 0, and so on. If a word repeats, you don't add it up
for(i=1:length(a))
for(j=1:length(a))
if isequal(a{i},a{j})
a{i}=1
end
That's the first part of my implementation, I'm having difficulty adding up both matrices and then comparing the new matrix with just b
if true
c=[a,b]
for(i=1:length(c))
for(j=1:length(a))
if isequal(c{i}, b{j})
c{i}=1
end
However, this doesn't work, as it outputs a matrix with all the words, even the ones that come up in both sentences, and I can't seem to assign indices to 0, when the if statement fails, if I put else b{i}=0, I get a wrong answer also.
this is how you can create the cell variable c that includes all the elements of both a and b
b={'hello' 'my' 'name' 'is' 'kevin'};
a={'and' 'my' 'name' 'is' 'susan'};
c = [a b]

댓글을 달려면 로그인하십시오.

Matt Kindig
Matt Kindig 2013년 4월 12일
편집: Matt Kindig 2013년 4월 12일

0 개 추천

Another approach might be to use ismember(). For example:
dictionary = {'hello', 'my', 'name', 'is', 'kevin', 'susan'}; %words to match
Results = false(nLines, length(dictionary));
count = 1;
fid = fopen('your_file.txt');
while ~feof(fid)
Line = strtrim(fgetl(fid)); %get line
words = lower(regexp(Line, '\s+', 'split')); %split into (lowercase) words
Results(count,:) = ismember( dictionary, words); %determine if present
end
%for each line k, Results(k,m) will indicate if the word at dictionary{m} is present.

댓글 수: 1

Blaise
Blaise 2013년 4월 13일
편집: Blaise 2013년 4월 16일
EDIT: I've found a solution I tried your code, but there's an error, nLines hasn't been declared.
I've sort of done it, with the example I used above without reading form a file, using ismember
for(i=1:length(a))
for(j=1:length(a))
ismember(a,a)
end
c=[a,b]
for(i=1:length(c))
for(j=1:length(b))
ismember(c,b)
end
end
However, with this code, if a word is seen more than once, it outputs 1 both all entries it's found in. I want it to ignore the second instance and put zero in it instead of 1. How can I go about doing this?
And I'm trying to do it from reading a file now, but I'm having difficulty with it. I want to read the first line and compare it with itself, then the first AND second and compare with the second, and then read the first, second AND third line and compare it with the third etc.

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Data Type Identification에 대해 자세히 알아보기

질문:

2013년 4월 11일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by