What is the correct input for the Two-sample Kolmogorov-Smirnov test, when I need to compare two histograms?
    조회 수: 12 (최근 30일)
  
       이전 댓글 표시
    
What is the correct input for the Two-sample Kolmogorov-Smirnov test, when I need to compare two histograms?
Example 1. If the input data are quite similar, it looks like there is no difference in the output of the Two-sample Kolmogorov-Smirnov test, either if I use the original data, "X" and "Y", or the bin counts "NX" and "NY" (returned from the histcounts function), as inputs for the Two-sample Kolmogorov-Smirnov test:
% inputs
X = [2 5 7 10 11 12 13 14 16 17 18 19       22 23 24 29];
Y = [2 5      11 12 13 14 16 17 18 19 20 21 22 23 24 29];
[NX,edgesX] = histcounts(X,'NumBins',6);
[NY,edgesY] = histcounts(Y,'NumBins',6);
% plot
hold on
histogram(X,edgesX,'FaceAlpha',0.1,'EdgeAlpha',0.8)
histogram(Y,edgesY,'FaceAlpha',0.1,'EdgeAlpha',0.2)
% Two-sample Kolmogorov-Smirnov test
[h1,p1,ks2stat1] = kstest2(X,Y);     % <-- By using the original input data
[h2,p2,ks2stat2] = kstest2(NX,NY);   % <-- By using the Bin counts "NX" and "NY", returned from the "histcounts" function
table([[h1,p1,ks2stat1];[h2,p2,ks2stat2]] ,'VariableNames', {'h | p | ks2stat'},'RowNames', {'kstest2(X,Y)', 'kstest2(NX,NY)'})

% Reusult of Example 1
ans =
  2×1 table
                                    h | p | ks2stat              
                      ___________________________________________
    kstest2(X,Y)      0    0.999035232339821                0.125
    kstest2(NX,NY)    0    0.999956514899259    0.166666666666667
Example 2. If the input data are a bit different, it looks like there is difference in the output of the Two-sample Kolmogorov-Smirnov test, if I use the original data, "X" and "Y", or the bin counts "NX" and "NY" (returned from the histcounts function), as inputs for the Two-sample Kolmogorov-Smirnov test:
% inputs
X = [2 5 7 10 11 12 13 14 16 17 18 19       22 23 24 29];
Y = [2 5      11 12 13 14 16 17 18 19 20 21 22 23 24 29 29 29 29 29 29 29 29];
[NX,edgesX] = histcounts(X,'NumBins',6);
[NY,edgesY] = histcounts(Y,'NumBins',6);
% plot
hold on
histogram(X,edgesX,'FaceAlpha',0.1,'EdgeAlpha',0.8)
histogram(Y,edgesY,'FaceAlpha',0.1,'EdgeAlpha',0.2)
% Two-sample Kolmogorov-Smirnov test
[h1,p1,ks2stat1] = kstest2(X,Y);
[h2,p2,ks2stat2] = kstest2(NX,NY);
table([[h1,p1,ks2stat1];[h2,p2,ks2stat2]] ,'VariableNames', {'h | p | ks2stat'},'RowNames', {'kstest2(X,Y)', 'kstest2(NX,NY)'})

% Result of Example 2
ans =
  2×1 table
                                    h | p | ks2stat              
                      ___________________________________________
    kstest2(X,Y)      0    0.251817384522441    0.315217391304348
    kstest2(NX,NY)    0    0.809557310616653    0.333333333333333
댓글 수: 5
답변 (1개)
  Divyam
      
 2024년 9월 13일
        The Two-sample Kolmogorov-Smirnov test is used to test whether the data from any two vectors is from the same continuous distributions.
To effectively implement the Kolmogorov-Smirnov test, you should use the vectors "X" and "Y" themselves since they are the direct representatives of the distribution which create the histogram. The "bincounts" variables "NX" and "NY" on the other hand, reveal the underlying shape of the distribution and do not represent the distribution itself in all cases (as evident when you choose different input data).
Note: Answering this question to improve the visibility for anyone referring the community with a similar query. 
댓글 수: 0
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

