Accelerating the pace of engineering and science

# kstest2

Two-sample Kolmogorov-Smirnov test

## Description

example

h = kstest2(x1,x2) returns a test decision for the null hypothesis that the data in vectors x1 and x2 are from the same continuous distribution, using the two-sample Kolmogorov-Smirnov test. The alternative hypothesis is that x1 and x2 are from different continuous distributions. The result h is 1 if the test rejects the null hypothesis at the 5% significance level, and 0 otherwise.

example

h = kstest2(x1,x2,Name,Value) returns a test decision for a two-sample Kolmogorov-Smirnov test with additional options specified by one or more name-value pair arguments. For example, you can change the significance level or conduct a one-sided test.

example

[h,p] = kstest2(___) also returns the asymptotic p-value p, using any of the input arguments from the previous syntaxes.

example

[h,p,ks2stat] = kstest2(___) also returns the test statistic ks2stat.

## Examples

expand all

### Test Two Samples for the Same Distribution

Generate sample data from two different Weibull distributions.

```rng(1);     % For reproducibility
x1 = wblrnd(1,1,1,50);
x2 = wblrnd(1.2,2,1,50);```

Test the null hypothesis that data in vectors x1 and x2 comes from populations with the same distribution.

```h = kstest2(x1,x2)
```
```h =
1```

The returned value of h = 1 indicates that kstest rejects the null hypothesis at the default 5% significance level.

### Test the Hypothesis at Different Significance Levels

Generate sample data from two different Weibull distributions.

```rng(1);     % For reproducibility
x1 = wblrnd(1,1,1,50);
x2 = wblrnd(1.2,2,1,50);```

Test the null hypothesis that data vectors x1 and x2 are from populations with the same distribution at the 1% significance level.

```[h,p] = kstest2(x1,x2,'Alpha',0.01)
```
```h =
0
p =
0.0317```

The returned value of h = 0 indicates that kstest does not reject the null hypothesis at the 1% significance level.

### One-Sided Hypothesis Test

Generate sample data from two different Weibull distributions.

```rng(1);     % For reproducibility
x1 = wblrnd(1,1,1,50);
x2 = wblrnd(1.2,2,1,50);```

Test the null hypothesis that data in vectors x1 and x2 comes from populations with the same distribution, against the alternative hypothesis that the cdf of the distribution of x1 is larger than the cdf of the distribution of x2.

```[h,p,k] = kstest2(x1,x2,'Tail','larger')
```
```h =
1
p =
0.0158
k =
0.2800```

The returned value of h = 1 indicates that kstest rejects the null hypothesis, in favor of the alternative hypothesis that the cdf of the distribution of x1 is larger than the cdf of the distribution of x2, at the default 5% significance level. The returned value of k is the test statistic for the two-sample Kolmogorov-Smirnov test.

## Input Arguments

expand all

### x1 — Sample datavector

Sample data from the first sample, specified as a vector. Data vectors x1 and x2 do not need to be the same size.

Data Types: single | double

### x2 — Sample datavector

Sample data from the second sample, specified as a vector. Data vectors x1 and x2 do not need to be the same size.

Data Types: single | double

### Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example:

### 'Alpha' — Significance level0.05 (default) | scalar value in the range (0,1)

Significance level of the hypothesis test, specified as the comma-separated pair consisting of 'Alpha' and a scalar value in the range (0,1).

Example: 'Alpha',0.01

Data Types: single | double

### 'Tail' — Type of alternative hypothesis'unequal' (default) | 'larger' | 'smaller'

Type of alternative hypothesis to evaluate, specified as the comma-separated pair consisting of 'Tail' and one of the following.

 'unequal' Test the alternative hypothesis that the empirical cdf of x1 is unequal to the empirical cdf of x2. 'larger' Test the alternative hypothesis that the empirical cdf of x1 is larger than the empirical cdf of x2. 'smaller' Test the alternative hypothesis that the empirical cdf of x1 is smaller than the empirical cdf of x2.

If the data values in x1 tend to be larger than those in x2, the empirical distribution function of x1 tends to be smaller than that of x2, and vice versa.

Example: 'Tail','larger'

## Output Arguments

expand all

### h — Hypothesis test result1 | 0

Hypothesis test result, returned as a logical value.

• If h = 1, this indicates the rejection of the null hypothesis at the Alpha significance level.

• If h = 0, this indicates a failure to reject the null hypothesis at the Alpha significance level.

### p — Asymptotic p-valuescalar value in the range (0,1)

Asymptotic p-value of the test, returned as a scalar value in the range (0,1). p is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. The asymptotic p-value becomes very accurate for large sample sizes, and is believed to be reasonably accurate for sample sizes n1 and n2, such that (n1*n2)/(n1 + n2)4.

### ks2stat — Test statisticnonnegative scalar value

Test statistic, returned as a nonnegative scalar value.

expand all

### Two-Sample Kolmogorov-Smirnov Test

The two-sample Kolmogorov-Smirnov test is a nonparametric hypothesis test that evaluates the difference between the cdfs of the distributions of the two sample data vectors over the range of x in each data set.

The two-sided test uses the maximum absolute difference between the cdfs of the distributions of the two data vectors. The test statistic is

${D}^{*}=\underset{x}{\mathrm{max}}\left(|{\stackrel{^}{F}}_{1}\left(x\right)-{\stackrel{^}{F}}_{2}\left(x\right)|\right),$

where ${\stackrel{^}{F}}_{1}\left(x\right)$ is the proportion of x1 values less than or equal to x and ${\stackrel{^}{F}}_{2}\left(x\right)$ is the proportion of x2 values less than or equal to x.

The one-sided test uses the actual value of the difference between the cdfs of the distributions of the two data vectors rather than the absolute value. The test statistic is

${D}^{*}=\underset{x}{\mathrm{max}}\left({\stackrel{^}{F}}_{1}\left(x\right)-{\stackrel{^}{F}}_{2}\left(x\right)\right).$

### Algorithms

In kstest2, the decision to reject the null hypothesis is based on comparing the p-value p with the significance level Alpha, not by comparing the test statistic ks2stat with a critical value.

## References

[1] Massey, F. J. "The Kolmogorov-Smirnov Test for Goodness of Fit." Journal of the American Statistical Association. Vol. 46, No. 253, 1951, pp. 68–78.

[2] Miller, L. H. "Table of Percentage Points of Kolmogorov Statistics." Journal of the American Statistical Association. Vol. 51, No. 273, 1956, pp. 111–121.

[3] Marsaglia, G., W. Tsang, and J. Wang. "Evaluating Kolmogorov's Distribution." Journal of Statistical Software. Vol. 8, Issue 18, 2003.