TreeBagger Training, large datasets

조회 수: 1 (최근 30일)
Claire Br
Claire Br 2015년 3월 27일
편집: TED MOSBY 2024년 11월 18일
I want to train the TreeBagger Classifier with a large dataset (4 mio x 1 array). My PC runs out of memory if I try to do this in one run! Is their a chance to run the Training in a loop? I was wodering if I could first use a subsets of the training data to train the TreeBagger algorithm and update it with the missing subsets. Could I use the results of the first Training-run as some kind of prior for the next?
Thanks, Claire

답변 (1개)

TED MOSBY
TED MOSBY 2024년 11월 15일
편집: TED MOSBY 2024년 11월 18일
The ‘TreeBagger’ class in MATLAB does not natively support incremental learning, which means you can't directly update an existing model with new data subsets.
You can try the following methods for efficient memory usage:
Train Multiple Models on Data Subsets:
  • Divide your dataset carefully so that it’s not biased
  • Train on each chunk
  • Combine models by averaging all the predictions
Preprocess data:
Consider down sampling or preprocessing your data before training. Feature selection, dimensionality reduction (e.g., PCA), or using a smaller, more representative subset of the data helps reduce the memory footprint.
Alternative algorithms:
If the above methods don’t work you can consider using other machine learning algorithms like XGBoost and LightGBM that can handle large datasets efficiently.
Hope this helps!

카테고리

Help CenterFile Exchange에서 Classification Ensembles에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by