In the chi-square test, how to calculate (the correct number of parameters and consequently) the correct number of degrees of freedom, without using the chi2gof function?
조회 수: 10 (최근 30일)
이전 댓글 표시
Question
In the chi-square test, how to calculate (the correct number of parameters and consequently) the correct number of degrees of freedom, without using the chi2gof function?
I have indeed noticed that the number of degrees of freedom was slightly different in one matlab answer and in the chi2gof function....
Population = [996, 749, 370, 53, 9, 3, 1, 0];
Sample = [647, 486, 100, 22, 0, 0, 0, 0];
Population2 = [996, 749, 370, sum(Population(4:8))];
Sample2 = [647, 486, 100, sum(Sample(4:8))];
chi2stat = sum((Sample2-Population2).^2./Population2);
df = length(Population2)-1;
pcrit = .05;
chi2crit = chi2inv(pcrit,df);
h2 = chi2stat > chi2crit;
p2 = 1 - chi2cdf(chi2stat,df);
fprintf('h=%d, p=%.3f df=%d\n',h2,p2,df);
"chi2gof compares the value of the test statistic to a chi-square distribution with degrees of freedom equal to nbins - 1 - nparams, where nbins is the number of bins used for the data pooling and nparams is the number of estimated parameters used to determine the expected counts."
bins = 0:5;
obsCounts = [6 16 10 12 4 2];
n = sum(obsCounts);
pd = fitdist(bins','Poisson','Frequency',obsCounts');
expCounts = n * pdf(pd,bins);
[h,p,st] = chi2gof(bins,'Ctrs',bins,...
'Frequency',obsCounts, ...
'Expected',expCounts,...
'NParams',1)
댓글 수: 0
답변 (1개)
dpb
2023년 6월 21일
Although you specified 'Ctrs', bins, chi2gof created only 5 bins because the obsCounts values for the last two bins in the 'Frequency' vector were too small individually. Hence the DOF for the chi-square test statistic turns out to be based on 5-1-1 --> 3 instead of 6-1-1 --> 4 that may have been what you were expecting?
댓글 수: 5
dpb
2023년 6월 22일
편집: dpb
2023년 6월 23일
Not sure what the remaing puzzle is so don't know how to try to add anything that haven't already said.
The correction to the number of DOF based solely on number of (collapsed) bins is simply how many parameters of the distribution used to calculate the expected counts per bin were estimated from the data itself -- IF ("the big if") the theoretical distribution parameter values are based on the input data itself.
If you test against counts from a theoretical distribution that is obtained from other considerations, then you've not estimated any further parameters from the count data itself and nParams=0.
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!