Matching Feature ranking algoritum outputs in classification leaner
조회 수: 3 (최근 30일)
이전 댓글 표시
Hi,
In 2023b within the feature selection algoritum tab of the classification learner app, you can generate feature ranks with five diffrent algoritums: MRMR, Chi2, ReliefF, ANOVA and Kruskal Wallis.
MRMR and Chi2 can be replicated with:
[idx,scores] = fscmrmr(randSamp(:,3:end),'Stage');
[idx,scores] = fscchi2(randSamp(:,3:end),'Stage');
Where randSamp is a table with some variables ignored at the start and 'Stage' is the lable of intrest.
However, I cannot figure out how to replicate the same with ANOVA and KW, I have tried something like this:
[idx,scores] = anova1(table2array(randSamp(:,4:end))',categorical(randSamp.Stage(:)));
[idx,scores] = kruskalwallis(table2array(randSamp(:,4:end))',categorical(randSamp.Stage(:)));
And while it done compute *something* I have no idea what it is doing or how to get it to match what the classification learner app is doing. Can anyone shed some light on this?
Christopher
댓글 수: 0
채택된 답변
Drew
2023년 11월 6일
The short answer is that, for some feature ranking techniques, there is some normalization of the features before the ranking. This is by design, since some feature ranking techniques are particularly sensitive to normalization. To see how Classification Learner is ranking the features, use the "Generate Function" button in Classification Learner to generate code to replicate the feature selection.
For example, take these steps to see some example generated code:
(1) t=readtable("fisheriris.csv");
(2) Start Classification Learner, load the fisher iris data, take defaults at session start
(3) Rank features with Kruskal-Wallis, choose keeping the top three features
(4) Train the default tree model
(5) In the Export area of the toolstrip, choose "Generate Function".
Below is a section of code from the function generated by Classification Learner. Notice the calls to "standardizeMissing" and "normalize" in the first two lines of (non-comment) code. These functions are also used in the later cross-validation part of the code. So, for each training fold (or for all of the training data for the final model), the "standardizeMissing" function and the default "zscore" method of the "normalize" function are being used before ranking the features. Note: The normalization used before feature ranking is independent of any normalization (or no normalization) used before model training.
% Feature Ranking and Selection
% Replace Inf/-Inf values with NaN to prepare data for normalization
predictors = standardizeMissing(predictors, {Inf, -Inf});
% Normalize data for feature ranking
predictorMatrix = normalize(predictors, "DataVariable", ~isCategoricalPredictor);
newPredictorMatrix = zeros(size(predictorMatrix));
for i = 1:size(predictorMatrix, 2)
if isCategoricalPredictor(i)
newPredictorMatrix(:,i) = grp2idx(predictorMatrix{:,i});
else
newPredictorMatrix(:,i) = predictorMatrix{:,i};
end
end
predictorMatrix = newPredictorMatrix;
responseVector = grp2idx(response);
% Rank features using Kruskal Wallis algorithm
for i = 1:size(predictorMatrix, 2)
pValues(i) = kruskalwallis(...
predictorMatrix(:,i), ...
responseVector, ...
'off');
end
[~,featureIndex] = sort(-log(pValues), 'descend');
numFeaturesToKeep = 3;
includedPredictorNames = predictors.Properties.VariableNames(featureIndex(1:numFeaturesToKeep));
predictors = predictors(:,includedPredictorNames);
isCategoricalPredictor = isCategoricalPredictor(featureIndex(1:numFeaturesToKeep));
If this answer helps you, please remember to accept the answer.
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Analysis of Variance and Covariance에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!