필터 지우기
필터 지우기

Matrix index is out of range for deletion

조회 수: 2 (최근 30일)
oliver
oliver 2023년 4월 10일
댓글: Walter Roberson 2023년 4월 10일
my project is sentiment analysis I am trying to follow the tutorial "Create Simple Text Model for Classification"
my database is a list of reviews with labelled sentiment (either 'positive' or 'negative)
I am trying to remove any documents containing no words from the bag-of-words model, and remove the corresponding entries in labels
my code is:
filename = "IMBD_reviews_smol.csv";
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
YTrain = dataTrain.sentiment;
YTest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
Ytrain(idx) = []; %produces an error
Deletion requires an existing variable.
Xtrain = bag.Counts;
mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end
  댓글 수: 7
oliver
oliver 2023년 4월 10일
with the code i recieve the error message "Error using classreg.learning.classif.FullClassificationModel.prepareData
No class names are found in input labels." about line 25 "mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");"
filename = "IMBD_reviews_smol.csv";
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
YTrain = dataTrain.sentiment;
YTest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
YTrain = [];
XTrain = bag.Counts;
mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");
documentsTest = preprocessText(textDataTest);
XTest = encode(bag,documentsTest);
YPred = predict(mdl,XTest);
acc = sum(YPred == YTest)/numel(YTest);
str = [
"i hated this movie."
"this was really good"
"sometimes slow movies work out in the way you want and thats how this movie went"];
documentsNew = preprocessText(str);
XNew = encode(bag,documentsNew);
labelsNew = predict(mdl,XNew);
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end
Walter Roberson
Walter Roberson 2023년 4월 10일
Yes, as I indicated, you are removing all documents from the bag, so your training information becomes empty.

댓글을 달려면 로그인하십시오.

채택된 답변

Walter Roberson
Walter Roberson 2023년 4월 10일
이동: Walter Roberson 2023년 4월 10일
filename = "IMBD_reviews_smol.csv";
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
Ytrain = dataTrain.sentiment;
Ytest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
whos Ytrain idx
Name Size Bytes Class Attributes Ytrain 181x1 423 categorical idx 1x181 1448 double
Ytrain(idx) = []; %produces an error
Xtrain = bag.Counts;
whos
Name Size Bytes Class Attributes Xtrain 0x0 24 double sparse Ytest 20x1 262 categorical Ytrain 0x1 242 categorical ans 1x46 92 char bag 1x1 640 bagOfWords cmdout 1x33 66 char cvp 1x1 3278 cvpartition data 201x2 543470 table dataTest 20x2 66077 table dataTrain 181x2 478944 table documents 181x1 43321 tokenizedDocument filename 1x1 178 string idx 1x181 1448 double textDataTest 20x1 64602 string textDataTrain 181x1 477308 string
mdl = fitcecoc(Xtrain, Ytrain, "Learners", "linear");
Error using classreg.learning.classif.FullClassificationModel.prepareData
No class names are found in input labels.

Error in ClassificationECOC.prepareData (line 128)
classreg.learning.classif.FullClassificationModel.prepareData(X,Y,varargin{:});

Error in classreg.learning.FitTemplate/fit (line 246)
this.PrepareData(X,Y,this.BaseFitObjectArgs{:});

Error in ClassificationECOC.fit (line 119)
this = fit(temp,X,Y);

Error in fitcecoc (line 357)
obj = ClassificationECOC.fit(X,Y,varargin{:});
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end
You are removing all of the documents. The bag is left empty.
  댓글 수: 2
oliver
oliver 2023년 4월 10일
편집: Walter Roberson 2023년 4월 10일
I am trying to follow this matlab link https://uk.mathworks.com/help/textanalytics/ug/create-simple-text-model-for-classification.html but using my own dataset. can you help with what i need to change?
Walter Roberson
Walter Roberson 2023년 4월 10일
You were calling removeShortWords twice, so all words less than 15 characters were being removed. The remaining "words" all happened to be unique, so removing infrequent words resulted in an empty bag.
filename = "IMBD_reviews_smol.csv";
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
Ytrain = dataTrain.sentiment;
Ytest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
Ytrain(idx) = [];
Xtrain = bag.Counts;
mdl = fitcecoc(Xtrain, Ytrain, "Learners", "linear");
mdl
mdl =
CompactClassificationECOC ResponseName: 'Y' ClassNames: [negative positive] ScoreTransform: 'none' BinaryLearners: {[1×1 ClassificationLinear]} CodingMatrix: [2×1 double] Properties, Methods
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeLongWords(documents,15);
end

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Modeling and Prediction에 대해 자세히 알아보기

제품


릴리스

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by