How to avoid uncertainty in processing result of MATLAB Statistics Toolbox

Question

jean young 2011년 2월 24일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1921-how-to-avoid-uncertainty-in-processing-result-of-matlab-statistics-toolbox

채택된 답변: Mahmoud Hammoud

I’m annoyed with the uncertainty of the processing result of my MATLAB program. My codes are as follows.

%-----------------------------

clear all; close all;

a = [0.3948 0.4644 0.4412 0.6270 0.6270 0.1626];

[idx c] = kmeans(a,2)

rate = c(1)/c(2)

%-----------------------------

I ran this program several times and found the results were quite interesting. Although the data set to be processed was determinate, the processing results could be different each time. I found there were at least four groups of answers.

%-----------------------------

idx = 1 1 1 2 2 1 c = 0.3658 0.6270 rate = 0.5833

idx = 1 1 1 1 1 2 c = 0.5109 0.1626 rate = 3.1419

idx = 2 2 2 1 1 2 c = 0.6270 0.3658 rate = 1.7143

idx = 2 2 2 2 2 1 c = 0.1626 0.5109 rate = 0.3183

%-----------------------------

Can anybody help me on how to avoid this uncertainty? BTW, my MATLAB version is R2008a.

Thank you in advance for any response.

Best regards,

Jean

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Mahmoud Hammoud 2011년 2월 24일

3
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1921-how-to-avoid-uncertainty-in-processing-result-of-matlab-statistics-toolbox#answer_2873

This is expected behavior because KMEANS by default selects the initial cluster centroid positions at random (albeit from the observations). That is, the value of the 'start' parameter is set to 'sample' as can be seen from the documentation. Another outcome you would also observe if you run your code several times is that KMEANS errors out because an empty cluster is created at the first iteration (i.e., idx is all 1's or all 2's). You could always pass a matrix of initial positions as the value for the 'start' parameter, for example:

[idx c] = kmeans(a,2,'start',[0 0.5]')

This would yield the same result every time but since the partition returned by KMEANS highly depends on the initial centroid positions, you would probably get a sub-optimal partition (unless your provide a "lucky" vector for the 'start' parameter). The typical use of KMEANS entails setting the 'Replicates' parameter to an integer n corresponding to the number of times to repeat the clustering. KMEANS then returns the partition with the lowest sum, over all clusters, of the within-cluster sums of point-to-cluster-centroid distances.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

jean young 2011년 2월 25일

Thank you very much! I have modified my program and the problem is solved.

댓글을 달려면 로그인하십시오.

How to avoid uncertainty in processing result of MATLAB Statistics Toolbox

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

How to avoid uncertainty in processing result of MATLAB Statistics Toolbox

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기