필터 지우기
필터 지우기

What is the correct input for the Two-sample Kolmogorov-Smirnov test, when I need to compare two histograms?

조회 수: 48 (최근 30일)
What is the correct input for the Two-sample Kolmogorov-Smirnov test, when I need to compare two histograms?
Example 1. If the input data are quite similar, it looks like there is no difference in the output of the Two-sample Kolmogorov-Smirnov test, either if I use the original data, "X" and "Y", or the bin counts "NX" and "NY" (returned from the histcounts function), as inputs for the Two-sample Kolmogorov-Smirnov test:
% inputs
X = [2 5 7 10 11 12 13 14 16 17 18 19 22 23 24 29];
Y = [2 5 11 12 13 14 16 17 18 19 20 21 22 23 24 29];
[NX,edgesX] = histcounts(X,'NumBins',6);
[NY,edgesY] = histcounts(Y,'NumBins',6);
% plot
hold on
histogram(X,edgesX,'FaceAlpha',0.1,'EdgeAlpha',0.8)
histogram(Y,edgesY,'FaceAlpha',0.1,'EdgeAlpha',0.2)
% Two-sample Kolmogorov-Smirnov test
[h1,p1,ks2stat1] = kstest2(X,Y); % <-- By using the original input data
[h2,p2,ks2stat2] = kstest2(NX,NY); % <-- By using the Bin counts "NX" and "NY", returned from the "histcounts" function
table([[h1,p1,ks2stat1];[h2,p2,ks2stat2]] ,'VariableNames', {'h | p | ks2stat'},'RowNames', {'kstest2(X,Y)', 'kstest2(NX,NY)'})
% Reusult of Example 1
ans =
2×1 table
h | p | ks2stat
___________________________________________
kstest2(X,Y) 0 0.999035232339821 0.125
kstest2(NX,NY) 0 0.999956514899259 0.166666666666667
Example 2. If the input data are a bit different, it looks like there is difference in the output of the Two-sample Kolmogorov-Smirnov test, if I use the original data, "X" and "Y", or the bin counts "NX" and "NY" (returned from the histcounts function), as inputs for the Two-sample Kolmogorov-Smirnov test:
% inputs
X = [2 5 7 10 11 12 13 14 16 17 18 19 22 23 24 29];
Y = [2 5 11 12 13 14 16 17 18 19 20 21 22 23 24 29 29 29 29 29 29 29 29];
[NX,edgesX] = histcounts(X,'NumBins',6);
[NY,edgesY] = histcounts(Y,'NumBins',6);
% plot
hold on
histogram(X,edgesX,'FaceAlpha',0.1,'EdgeAlpha',0.8)
histogram(Y,edgesY,'FaceAlpha',0.1,'EdgeAlpha',0.2)
% Two-sample Kolmogorov-Smirnov test
[h1,p1,ks2stat1] = kstest2(X,Y);
[h2,p2,ks2stat2] = kstest2(NX,NY);
table([[h1,p1,ks2stat1];[h2,p2,ks2stat2]] ,'VariableNames', {'h | p | ks2stat'},'RowNames', {'kstest2(X,Y)', 'kstest2(NX,NY)'})
% Result of Example 2
ans =
2×1 table
h | p | ks2stat
___________________________________________
kstest2(X,Y) 0 0.251817384522441 0.315217391304348
kstest2(NX,NY) 0 0.809557310616653 0.333333333333333
  댓글 수: 5
Sim
Sim 2024년 7월 16일 11:21
I have just checked the kstest2(x1,x2) function by opening it in the Command Window:
>> open kstest2
I can see that the Bin counts are already calculated in the part where the empirical CDFs are derived:
% Calculate F1(x) and F2(x), the empirical (i.e., sample) CDFs.
...
binCounts1 = histc (x1 , binEdges, 1);
binCounts2 = histc (x2 , binEdges, 1);
...
Therefore, to the best of my understanding, the correct usage of kstest2(x1,x2) is only with the original input data "X" and "Y":
[h1,p1,ks2stat1] = kstest2(X,Y); % <-- By using the original input data
While the employment of the Bin counts "NX" and "NY" would lead to a wrong result:
[h2,p2,ks2stat2] = kstest2(NX,NY); % <-- By using the Bin counts "NX" and "NY", returned from the "histcounts" function
If anyone wants to confirm my reasoning is very welcome!
(maybe, from @MathWorks Support Team as well?)
Sim
Sim 2024년 7월 16일 11:24
편집: Sim 2024년 7월 16일 11:25
Yes @Divyam, you are right! I just need to use original data "X" and "Y" (that you called the "vectors themselves")!! ...and kstest2 will do the rest of the work for me :-)
Many thanks! :-) :-)

댓글을 달려면 로그인하십시오.

답변 (0개)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by