How is predictor importance for classification trees calculated?

Question

Ryan Jones 2021년 1월 22일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/724163-how-is-predictor-importance-for-classification-trees-calculated

댓글: Ryan Jones 2021년 4월 9일

MATLAB Online에서 열기

I am using MATLAB's function:

predictorImportance

to evaluate the usefulness of features I am extracting from 360° images.

I don't fully understand how predictor importance estimates are calculated and was hoping for a mathematical explanation for the algorithm used.

I have read the MATLAB documentation on this, however, I am unsure about a few things.

Firstly, what is risk? I have assumed it to be the impurity reduction if using the Gini index as the splitting criterion.

Secondly, what does "his sum is taken over best splits found at each branch node" when surrogate splits aren't used.

Finally, I don't understand why the estimates change when you reorder the columns in the feature matrix.

Thank you in advance to anyone able to shed light on this for me.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Gaurav Garg 2021년 1월 27일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/724163-how-is-predictor-importance-for-classification-trees-calculated#answer_607303

Hi Ryan,

Yes, risk means impurity reduction if using the Gini index as the splitting criterion. You can also give 'twoing' or 'deviance' as split criterions by following the doc here.

To know about why the estimates change when you reorder columns, you can go through the doc here to understand the algorithm involved behind selections of nodes and splitting of each branch node.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Ryan Jones 2021년 4월 9일

Hi Gaurav,

Thanks for the reply and I have now seen the documentation explaining predictor importance has been updated. It is a lot clear, thank you.

However, I still don't understand why estimates change when you reorder the columns. I have read the doc here on node splitting rules. To my understanding, the only step in the algorithm that is affected by the order of the columns is step 3 where the columns are ordered in ascending order (in terms of column indices I assume). However, I don't see how the order matters anyway because the purity gain in step 4 is calculated for all features (columns) and the one chosen for the split is the one with the greatest purity gain. So the order shouldn't matter as all columns are considered in step 4 anyway.

I hope my misunderstanding is clear and thank you in advance for your help.

댓글을 달려면 로그인하십시오.

How is predictor importance for classification trees calculated?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How is predictor importance for classification trees calculated?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기