Make classification with huge dataset
이전 댓글 표시
I'm trying to make classification with huge dataset containing 6 persons for training and here I'm getting this error from only 1 person dataset: "Requested 248376x39305 (9.1GB) array exceeds maximum array size preference." First of all I'm trying Bagged Tree and Neural Network classificators and I want to ask how can I do it? It's possible to learn these classificators in portions of datasets (learn saved classification model again)?
댓글 수: 9
Greg Heath
2016년 11월 7일
Please explain how 248376 x 39305 constitutes a 1 person data set
[ I N ] = size(input)
[ O N ] = size(target)
Thanks,
Greg
Mindaugas Vaiciunas
2016년 11월 7일
편집: Walter Roberson
2016년 11월 7일
Walter Roberson
2016년 11월 7일
Please show your Tree Bagging code. https://www.mathworks.com/help/stats/treebagger.html does not return matrices.
Mindaugas Vaiciunas
2016년 11월 7일
Walter Roberson
2016년 11월 7일
Have you considered reducing the number of trees?
Mindaugas Vaiciunas
2016년 11월 8일
Greg Heath
2016년 11월 9일
편집: Greg Heath
2016년 11월 9일
I still don't get it
39305/765
ans =
51.3791
Regardless, I think you should use dimensionality reduction via feature extraction.
Hope this helps,
Greg
Mindaugas Vaiciunas
2016년 11월 9일
Greg Heath
2016년 11월 10일
Of course it will affect it. However, the way to choose is to set a limit on the loss of accuracy.
답변 (1개)
Walter Roberson
2016년 11월 7일
0 개 추천
Add more memory (RAM) to you computer. Then check or adjust Preferences -> MATLAB -> Workspace -> MATLAB array size limit.
Or, you could set the division ratios so that a much smaller fraction is used for training and validation, with most of it left for test. This effectively uses only a small subset of the data, but a different small subset each time it trains.
댓글 수: 6
Mindaugas Vaiciunas
2016년 11월 7일
Walter Roberson
2016년 11월 7일
Amazon Web Services, among other providers, make available machines with more than 36 Gb of RAM. If you had that much RAM your program would run; therefore adding RAM is a solution for the problem.
Mindaugas Vaiciunas
2016년 11월 8일
Walter Roberson
2016년 11월 8일
https://www.mathworks.com/products/parallel-computing/matlab-parallel-cloud/ 16 workers, 60 Gigabytes, $US 4.32 per hour educational pricing, including compute services.
Or if you provide your own EC2 instance, https://www.mathworks.com/products/parallel-computing/parallel-computing-on-the-cloud/distriben-ec2.html $0.07 per worker per hour for the software licensing from MATLAB. For example you could do https://aws.amazon.com/ec2/pricing/on-demand/ m4.4xlarge, 16 cores, 64 gigabytes, $US 0.958 per hour for the EC2 service. Between that and the $0.07 per worker from Mathworks it would come in less than $US2.50 per hour. About the price of a Starbucks "Grande" coffee.
Remember, your time is not really "free". At the very least you need to take into account "opportunity costs" -- like an hour spent fighting a memory issue is an hour you could have been working on a minimum wage job.
Mindaugas Vaiciunas
2016년 11월 9일
Walter Roberson
2016년 11월 9일
Let me put it this way:
- You do not with to reduce the number of trees or the data because doing so might decrease the recognition rate
- We do not have a magic low-memory implementation of the TreeBagger available.
- You do not have enough memory on your system to run the classification using the existing software
Your choices would seem to be:
- write the classifier yourself, somehow not using as much memory; or
- obtain more memory for your own system; or
- obtain use of a system with more memory
카테고리
도움말 센터 및 File Exchange에서 Licensing on Cloud Platforms에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!