Matching Feature ranking algoritum outputs in classification leaner

Question

Christopher McCausland 2023년 11월 5일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2043287-matching-feature-ranking-algoritum-outputs-in-classification-leaner

댓글: Christopher McCausland 2023년 11월 8일

Hi,

In 2023b within the feature selection algoritum tab of the classification learner app, you can generate feature ranks with five diffrent algoritums: MRMR, Chi2, ReliefF, ANOVA and Kruskal Wallis.

MRMR and Chi2 can be replicated with:

[idx,scores] = fscmrmr(randSamp(:,3:end),'Stage');
[idx,scores] = fscchi2(randSamp(:,3:end),'Stage');

Where randSamp is a table with some variables ignored at the start and 'Stage' is the lable of intrest.

However, I cannot figure out how to replicate the same with ANOVA and KW, I have tried something like this:

[idx,scores] = anova1(table2array(randSamp(:,4:end))',categorical(randSamp.Stage(:)));
[idx,scores] = kruskalwallis(table2array(randSamp(:,4:end))',categorical(randSamp.Stage(:)));

And while it done compute *something* I have no idea what it is doing or how to get it to match what the classification learner app is doing. Can anyone shed some light on this?

Christopher

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Drew 2023년 11월 6일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2043287-matching-feature-ranking-algoritum-outputs-in-classification-leaner#answer_1347107

MATLAB Online에서 열기

The short answer is that, for some feature ranking techniques, there is some normalization of the features before the ranking. This is by design, since some feature ranking techniques are particularly sensitive to normalization. To see how Classification Learner is ranking the features, use the "Generate Function" button in Classification Learner to generate code to replicate the feature selection.

For example, take these steps to see some example generated code:

(1) t=readtable("fisheriris.csv");

(2) Start Classification Learner, load the fisher iris data, take defaults at session start

(3) Rank features with Kruskal-Wallis, choose keeping the top three features

(4) Train the default tree model

(5) In the Export area of the toolstrip, choose "Generate Function".

Below is a section of code from the function generated by Classification Learner. Notice the calls to "standardizeMissing" and "normalize" in the first two lines of (non-comment) code. These functions are also used in the later cross-validation part of the code. So, for each training fold (or for all of the training data for the final model), the "standardizeMissing" function and the default "zscore" method of the "normalize" function are being used before ranking the features. Note: The normalization used before feature ranking is independent of any normalization (or no normalization) used before model training.

% Feature Ranking and Selection
% Replace Inf/-Inf values with NaN to prepare data for normalization
predictors = standardizeMissing(predictors, {Inf, -Inf});
% Normalize data for feature ranking
predictorMatrix = normalize(predictors, "DataVariable", ~isCategoricalPredictor);
newPredictorMatrix = zeros(size(predictorMatrix));
for i = 1:size(predictorMatrix, 2)
    if isCategoricalPredictor(i)
        newPredictorMatrix(:,i) = grp2idx(predictorMatrix{:,i});
    else
        newPredictorMatrix(:,i) = predictorMatrix{:,i};
    end
end
predictorMatrix = newPredictorMatrix;
responseVector = grp2idx(response);
% Rank features using Kruskal Wallis algorithm
for i = 1:size(predictorMatrix, 2)
    pValues(i) = kruskalwallis(...
        predictorMatrix(:,i), ...
        responseVector, ...
        'off');
end
[~,featureIndex] = sort(-log(pValues), 'descend');
numFeaturesToKeep = 3;
includedPredictorNames = predictors.Properties.VariableNames(featureIndex(1:numFeaturesToKeep));
predictors = predictors(:,includedPredictorNames);
isCategoricalPredictor = isCategoricalPredictor(featureIndex(1:numFeaturesToKeep));

If this answer helps you, please remember to accept the answer.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Christopher McCausland 2023년 11월 8일

Hi Drew,

I thought I had accepted this answer and all, so appologies. It was a good idea to generate the code and then review, thank you for adding in the additional discription too. it made it a lot easier to follow the design thought path.

Christopher

댓글을 달려면 로그인하십시오.

Matching Feature ranking algoritum outputs in classification leaner

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Matching Feature ranking algoritum outputs in classification leaner

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기