필터 지우기
필터 지우기

comparing subsets of bagofwords and converting bagofwords to cell array

조회 수: 1 (최근 30일)
mert karakaya
mert karakaya 2022년 5월 24일
편집: Yash Sharma 2023년 9월 22일
i have one big bagOfWords array, and two bagOfWords arrays. vocabularies of two arrays are elements of the big array's vocabulary (meaning they are subsets of the big).
now i want to create an array that is, its first column must be big array's vocabulary, second column must be one of the two array's counts value for that row's vocabulary word, third column must be the other array's counts value for that row's vocabulary word. and if an array doesn't have a counts value for that row's vocabulary word then i want to make that index value 1.
how can i do this? i didn't find any function that converts bagofwords to cell array. and i'm not sure how to compare those arrays.
the arrays are in the attachment

답변 (1개)

Yash Sharma
Yash Sharma 2023년 9월 22일
편집: Yash Sharma 2023년 9월 22일
I understand that you have three Bag of Words arrays: spam, nonspam, and allwords. The spam and nonspam arrays are subsets of the allwords array. You will need to extract individual words and there counts from the arrays and then merge them according to your logic.
Here is an example code for the same.
allwords = load('allwords.mat'); % Replace with the path to the big bagOfWords file
spam = load('spam.mat'); % Replace with the path to the first bagOfWords file
nonspam = load('nonspam.mat'); % Replace with the path to the second bagOfWords file
% Convert the bagOfWords objects to cell arrays
bigVocab = allWords.Vocabulary;
bigCounts = allWords.Counts;
allwordsarray = [bigVocab', full(bigCounts)'];
vocab1 = spam.ans.Vocabulary;
counts1 = spam.ans.Counts;
array1 = [vocab1', full(counts1)'];
vocab2 = nonspam.ans.Vocabulary;
counts2 = nonspam.ans.Counts;
array2 = [vocab2', full(counts2)'];
% Create the desired array
desiredArray = cell(length(allwordsarray), 3);
for i = 1:length(allwordsarray)
word = allwordsarray{i, 1};
count0 = allwordsarray(i, 2);
count1 = 1;%Default value if the word is not found in array 1
count2 = 1;%Default value if the word is not found in array 2
% Check if the word exists in array 1 and get its count value
index1 = find(strcmp(array1(:, 1), word));
if ~isempty(index1)
count1 = array1{index1, 2};
end
% Check if the word exists in array 2 and get its count value
index2 = find(strcmp(array2(:, 1), word));
if ~isempty(index2)
count2 = array2{index2, 2};
end
% Assign the values to the desired array
desiredArray{i, 1} = word;
desiredArray{i, 2} = count0;
desiredArray{i, 3} = count1;
desiredArray{i, 4} = count2;
end
% Display the desired array
disp(desiredArray);
Please find links to below documentation which I believe will help you for further reference.

카테고리

Help CenterFile Exchange에서 Text Data Preparation에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by