## Hypothesis Test Assumptions

Different hypothesis tests make different assumptions about the distribution of the random variable being sampled in the data. These assumptions must be considered when choosing a test and when interpreting the results.

For example, the z-test (`ztest`) and the t-test (`ttest`) both assume that the data are independently sampled from a normal distribution. Statistics and Machine Learning Toolbox™ functions are available for testing this assumption, such as `chi2gof`, `jbtest`, `lillietest`, and `normplot`.

Both the z-test and the t-test are relatively robust with respect to departures from this assumption, so long as the sample size n is large enough. Both tests compute a sample mean $\overline{x}$, which, by the Central Limit Theorem, has an approximately normal sampling distribution with mean equal to the population mean μ, regardless of the population distribution being sampled.

The difference between the z-test and the t-test is in the assumption of the standard deviation σ of the underlying normal distribution. A z-test assumes that σ is known; a t-test does not. As a result, a t-test must compute an estimate s of the standard deviation from the sample.

Test statistics for the z-test and the t-test are, respectively,

`$\begin{array}{l}z=\frac{\overline{x}-\mu }{\sigma /\sqrt{n}}\\ t=\frac{\overline{x}-\mu }{s/\sqrt{n}}\end{array}$`

Under the null hypothesis that the population is distributed with mean μ, the z-statistic has a standard normal distribution, N(0,1). Under the same null hypothesis, the t-statistic has Student's t distribution with n – 1 degrees of freedom. For small sample sizes, Student's t distribution is flatter and wider than N(0,1), compensating for the decreased confidence in the estimate s. As sample size increases, however, Student's t distribution approaches the standard normal distribution, and the two tests become essentially equivalent.

Knowing the distribution of the test statistic under the null hypothesis allows for accurate calculation of p-values. Interpreting p-values in the context of the test assumptions allows for critical analysis of test results.

Assumptions underlying Statistics and Machine Learning Toolbox hypothesis tests are given in the reference pages for implementing functions.