How to send a big data (loaded into datastore object) to a classifier in Matlab?

조회 수: 4 (최근 30일)
Sara Salimi
Sara Salimi 2017년 5월 29일
댓글: Sara Salimi 2017년 6월 5일
this is my first experince working with data storages in `Matlab`. I hoping I can get some guidance here. I have a big data that I have saved features and corresponding labels of each rows into two `txt` file: one is `data.txt` and one is `label.txt`. Each file has `264e6 rows`. I did the following steps:
%creating datastore objects
datafile='data.txt';
ds=datastore(datafile,'TreatAsMissing','NA');
labelfile='label.txt';
ds_lbl=datastore(labelfile,'TreatAsMissing','NA');
After sending to classifier, I am facing the following error:
Mdl=fitcnb(read(ds),read(ds_lbl));
Error using classreg.learning.FullClassificationRegressionModel.prepareDataCR (line 201)
X and Y do not have the same number of observations.
Error in classreg.learning.classif.FullClassificationModel.prepareData (line 487)
classreg.learning.FullClassificationRegressionModel.prepareDataCR(...
Error in ClassificationNaiveBayes.prepareData (line 143)
prepareData@classreg.learning.classif.FullClassificationModel(X,Y,varargin{:},'OrdinalIsCategorical',true);
Error in classreg.learning.FitTemplate/fit (line 213)
this.PrepareData(X,Y,this.BaseFitObjectArgs{:});
Error in ClassificationNaiveBayes.fit (line 132)
this = fit(temp,X,Y);
Error in fitcnb (line 307)
this = ClassificationNaiveBayes.fit(X,Y,RemainingArgs{:});
With predefined `Readsize`, which is `20000` the classifier works. But even whenever I change the Readsize to `1e6`, it is showing the same error. The other point is that with predefined readsize, classifier is only able to classify `20000` records, while I have `264e6 rcords`.
I really appreciate if you suggest a solution. How can I send datastorage to the classifier?

답변 (1개)

Don Mathis
Don Mathis 2017년 5월 30일
I think you need to pass tall arrays or a tall table to fitcnb. See the documentation here: http://www.mathworks.com/help/stats/fitcnb.html?searchHighlight=fitcnb&s_tid=doc_srchtitle#bvnjlgv
and here:
You can get a tall table from a datastore like this:
tt = tall(ds)
  댓글 수: 3
Don Mathis
Don Mathis 2017년 6월 5일
편집: Don Mathis 2017년 6월 5일
I have not tried to do this myself, but from the error message it looks like you need to create your two tall arrays from the same datastore. So you'll need to put your labels in the same datastore as your features. I guess you could concatenate your two txt files "side by side", and then create your single datastore. After that, I think you would create a single tall array from that datastore, and then pass the 'features' columns of that as X and the 'label' column as Y, using the syntax fitcnb(X,Y).
Sara Salimi
Sara Salimi 2017년 6월 5일
Thank you so much sir for mentioning this. I really appreciate it. Many thanks once again for all your helps and supports.

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by