I have a large array of small integer inputs and want to create a small array of Double outputs. Best machine learning approach?

조회 수: 1 (최근 30일)
I have a large amount of training data (about 10,000) wherein I have a large array (~500,000) of small integer inputs (typically 0 or 1 with a tail that falls off exponentially toward 10) and a relatively small array (~30) of Double values between 0 and 1 as outputs.
In general the elements of the input array are uncorrelated with each other (meaning they will independently affect the output). Technically there are some correlations, but I'd be willing to ignore them as they are not terribly important.
I can add a few categorical and numeric engineered features to the input that would have a significant affect on the output.
I would like to try a machine learning approach to this problem within Matlab, but I am worried about the large amount of inputs. Is there an approach that's viable, or would some dimensionality reduction be absolutely necessary before proceeding? Any sort of dimensionality reduction would destroy the corellations between inputs - but I can live with that.

채택된 답변

Dheeraj Singh
Dheeraj Singh 2019년 9월 30일
As the number of data samples are less than no of features, if we use all the features then it can degrade the prediction performance even when all the features are relevant and contain information about the response variable.
You need to do feature selection to reduce the dimensionality of your data.
You can refer to know more about the different feature selection techniques:

추가 답변 (1개)

Doug Rank
Doug Rank 2019년 10월 10일
Thank you! I will check out those options and see which are appropriate.

카테고리

Help CenterFile Exchange에서 Statistics and Machine Learning Toolbox에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by