Increasing efficiency of one-hot encoding

I have a dataset - 50 variables and an output. There are 17 categories for this dataset. I want to do feature selection on this dataset to determine which variables are significant. I am using the fsrnca function + one-hot encoding (so adding a matrix of size no.observations*17, with 1s and 0s to deal with the categories and concatenating this maxtrix to X so X' = [X_categories X] & y remains as it is. I am wondering if there is a faster way of doing this (than this standard one-hot encoding approach) (run-time is very slow as very high dimensionality). Hope this makes sense. Thanks!

댓글 수: 3

Mohammad Sami
Mohammad Sami 2020년 1월 16일
Which step is taking very long?
darova
darova 2020년 1월 16일
And where is the code?
Athul Prakash
Athul Prakash 2020년 1월 28일
Kindly provide your code so that others can investigate which step is slowing you down.

댓글을 달려면 로그인하십시오.

답변 (1개)

Walter Roberson
Walter Roberson 2020년 1월 28일

0 개 추천

catnum = uint8(TheCategorical(:).');
numcat = max(catnum);
OH = zeros(NumberOfObservations, numcat);
OH(sub2ind(size(OH), 1:NumberOfObservations, catnum)) = 1;
Or
catnum = uint8(TheCategorical(:).');
OH = sparse(1:NumberOfObservations, catnum, 1);

카테고리

도움말 센터File Exchange에서 Language Support에 대해 자세히 알아보기

질문:

2020년 1월 14일

답변:

2020년 1월 28일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by