Using chi2gof to test two distributions

조회 수: 18 (최근 30일)
Allie
Allie 2019년 2월 6일
편집: Sim 2024년 8월 14일
I want to use the chi2gof to test if two distributions come from a common distribution (null hypothesis) or if they do not come from a common distribution (alternative hypothesis). I have binned observational data (x), binned model data (y), and the bin edges (bins). Both the observational and model data are counts per bin.
x= [41 22 11 10 9 5 2 3 2]
y= [38.052 24.2655 15.4665 9.8595 6.2895 4.011 2.562 1.6275 2.8665]
bins=[0:9:81]
Because the data is already binned and because I'm testing x against y, I used the following code
[h,p,stat]=chi2gof(x,'Edges',bins,'Expected',y)
Manual calculation of the chi2 test statistic results in 4.6861 with a probablity of p=.7905. The above function however, produces a very different result. The resulting stats show different bin edges than designated, the ovserved counts per bin do not match x, the chi2 test statistic is ~87, and p<0.001. Could someone please explain why I'm getting such dramatically different results?

채택된 답변

Jeff Miller
Jeff Miller 2019년 2월 7일
Sorry, the x's really do have to be the data values. Try this:
bins=[0:9:81]
xvals = bins(1:end-1)+4.5; % Here are some fake data values that belong in each bin.
xcounts= [41 22 11 10 9 5 2 3 2] % These are the counts of the data values in each bin.
y= [38.052 24.2655 15.4665 9.8595 6.2895 4.011 2.562 1.6275 2.8665];
[h,p,stat]=chi2gof(xvals,'Edges',bins,'Expected',y,'Frequency',xcounts,'EMin',1)
This will give you your 4.68. By default, chi2gof groups small bins (less than 5) together, and 'EMin' tells it not to do that.
  댓글 수: 2
Allie
Allie 2019년 2월 7일
This worked! Thank you
Sim
Sim 2024년 7월 29일

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Jeff Miller
Jeff Miller 2019년 2월 6일
It looks like chi2gof expects the values in x to be the actual, original scores, not the bin counts. Try adding 'Frequency',x to the parameter list.
  댓글 수: 1
Allie
Allie 2019년 2월 7일
편집: Allie 2019년 2월 7일
This did not work. The stat output is below. As you can see, it changed the edges and expected values from what I originally input and the chi2stat became even bigger.
stat =
chi2stat: 234.4383
df: 5
edges: [0 9 18 27 36 45 81]
O: [12 30 22 0 41 0]
E: [38.0520 24.2655 15.4665 9.8595 6.2895 11.0670]

댓글을 달려면 로그인하십시오.


Sim
Sim 2024년 8월 14일
편집: Sim 2024년 8월 14일
Shouldn't you use the two-sample chi-square test?
The Chi-squared test needs binned data. However, as far as I understand, you need to give the raw data, and not the binned data, as inputs of CHI2TEST2.
Indeed, CHI2TEST2 places the raw data into bins:
bins = unique([x1(:,1); x2(:,1)]); % create a bin for each unique value

카테고리

Help CenterFile Exchange에서 Hypothesis Tests에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by