Fisher’s exact test is a nonparametric
statistical test used to test the null hypothesis that no nonrandom
associations exist between two categorical variables, against the
alternative that there is a nonrandom association between the variables.
Fisher’s exact test provides an alternative to the chi-squared
test for small samples, or samples with very uneven marginal distributions.
Unlike the chi-squared test, Fisher’s exact test does not depend
on large-sample distribution assumptions, and instead calculates an
exact p-value based on the sample data. Although
Fisher’s exact test is valid for samples of any size, it is
not recommended for large samples because it is computationally intensive.
If all of the frequency counts in the contingency table are greater
than or equal to 1e7
, then fishertest
errors.
For contingency tables that contain large count values or are well-balanced,
use crosstab
or chi2gof
instead.
fishertest
accepts a 2-by-2 contingency
table as input, and computes the p-value of the
test as follows:
Calculate the sums for each row, column, and total
number of observations in the contingency table.
Using a multivariate generalization of the hypergeometric
probability function, calculate the conditional probability of observing
the exact result in the contingency table if the null hypothesis were
true, given its row and column sums. The conditional probability is
where R1 and R2 are
the row sums, C1 and C2 are
the column sums, N is the total number of observations
in the contingency table, and nij is
the value in the ith row and jth
column of the table.
Find all possible matrices of nonnegative integers
consistent with the row and column sums. For each matrix, calculate
the associated conditional probability using the equation for Pcutoff.
Use these values to calculate the p-value
of the test, based on the alternative hypothesis of interest.
For a two-sided test, sum all of the conditional probabilities
less than or equal to Pcutoff for
the observed contingency table. This represents the probability of
observing a result as extreme as, or more extreme than, the actual
outcome if the null hypothesis were true. Small p-values
cast doubt on the validity of the null hypothesis, in favor of the
alternative hypothesis of association between the variables.
For a left-sided test, sum the conditional probabilities
of all the matrices with a (1,1) cell frequency less than or equal
to n11.
For a right-sided test, sum the conditional probabilities
of all the matrices with a (1,1) cell frequency greater than or equal
to n11 in
the observed contingency table.
The odds ratio is
The null hypothesis of conditional independence is equivalent
to the hypothesis that the odds ratio equals 1. The left-sided alternative
is equivalent to an odds ratio less than 1, and the right-sided alternative
is equivalent to an odds ratio greater than 1.
The asymptotic 100(1 – α)% confidence interval
for the odds ratio is
where L is the log odds ratio, Φ-1(
• ) is the inverse of the normal inverse
cumulative distribution function, and SE is the
standard error for the log odds ratio. If the 100(1 – α)%
confidence interval does not contain the value 1, then the association
is significant at the α significance level. If any of the four
cell frequencies are 0, then fishertest
does
not compute the confidence interval and instead displays [-Inf
Inf]
.
fishertest
only accepts 2-by-2 contingency
tables as input. To test the independence of categorical variables
with more than two levels, use the chi-squared test provided by crosstab
.