필터 지우기
필터 지우기

Why does chi2gof rejects normal hypothesis when giving normal distributed data?

조회 수: 8 (최근 30일)
Chris Loyt
Chris Loyt 2021년 6월 11일
편집: Chris Loyt 2021년 6월 11일
Hi,
I was trying to use the chi2gof function. For this I created normally distributed data. So I'd assume that chi2gof should indicate a small p value (to reject null-hypothesis). But no matter how big I increase the sample size, the function will not reject. Further when plotted, it is obvious in my opinion, that it is normally distributed (as I assume that makedist works well).
Can anyone explain that behaviour? Do I make a mistake? Or am I missjudging the purpose of this test?
% create normally distributed data
pd = makedist('Normal');
rng default; % for reproducibility
x = random(pd,5000,1);
% display data with histfit
histfit(x)
% run chi2gof
[h,p,stat] = chi2gof(x)
h =
0
p =
0.1967
Edit: I mixed up H0 and H1, so everything is as expected.
%% chi squared tests for goodnes of fit
% test if the data follows a defined distribution
% create normally distributed data
pd = makedist('Normal');
rng default; % for reproducibility
x = random(pd,5000,1);
% add strong linear effect -> data won't follow normal distribution
x = x + +1:5000;
histfit(x)
% run chi2gof
[h,p,stat] = chi2gof(x)
% result
h =
1
p =
2.1686e-68
So h == 1 means it's not following the expected distribution.

답변 (1개)

Chunru
Chunru 2021년 6월 11일
You need a even bigger sample size for achieving high p value.
pd = makedist('Normal');
rng default; % for reproducibility
x = random(pd,500000,1);
% display data with histfit
histfit(x)
% run chi2gof
[h,p,stat] = chi2gof(x)
h = 0
p = 0.7489
stat = struct with fields:
chi2stat: 4.2642 df: 7 edges: [-4.4336 -3.5261 -2.6185 -1.7109 -0.8034 0.1042 1.0117 1.9193 2.8268 3.7344 4.6419] O: [104 2064 19591 83525 165403 151255 64279 12649 1079 51] E: [104.6820 2.0938e+03 1.9528e+04 8.3633e+04 1.6537e+05 1.5140e+05 6.4161e+04 1.2540e+04 1.1237e+03 46.7105]
  댓글 수: 1
Chris Loyt
Chris Loyt 2021년 6월 11일
편집: Chris Loyt 2021년 6월 11일
Thank you for your answer! In your answer the null hypotesis is not rejected either
p = 0.7489 -> increased the p value.
From what I know the p-value should decrease to less 0.05 (depending on sigificance level) to reject H0.
Edit: I just checked, I mixed up H0 and H1 in my head. So everything is working as expected. H0 would mean that the data acually comes from nomal distribution and H1 (so the rejection) would indicate another distribution is needed.

댓글을 달려면 로그인하십시오.

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by